A Deep Dive into regular expressions in JavaScript
Regular expressions, often referred to as regex, are powerful tools used for pattern matching and manipulation of strings.
In JavaScript, regular expressions are represented by objects of the RegExp class.
They provide a concise and flexible way to search, match, replace, and validate strings based on specific patterns.
Creating Regular Expressions:
Regular expressions can be created using either the constructor syntax or the literal syntax.
Here's an example of both:
Constructor Syntax:
let regex = new RegExp("pattern");
Literal Syntax:
let regex = /pattern/;
In the above examples, "pattern" represents the regular expression pattern you want to match.
Methods and Properties of Regular Expressions:
1. test(): This method checks if a pattern matches a string and returns true or false.
let regex = /hello/;
console.log(regex.test("hello world")); // Output: true
2. exec(): This method searches a string for a match and returns an array containing information about the match. If no match is found, it returns null.
let regex = /world/;
console.log(regex.exec("hello world")); // Output: ["world"]
3. match(): This method searches a string for one or more matches using a pattern and returns an array of all matched substrings.
let regex = /lo/;
console.log("hello world".match(regex)); // Output: ["lo"]
4. search(): This method searches a string for a specified pattern and returns the index of the first match. If no match is found, it returns -1.
let regex = /world/;
console.log("hello world".search(regex)); // Output: 6
5. replace(): This method searches a string for a specified pattern and replaces it with a new string.
let regex = /world/;
console.log("hello world".replace(regex, "universe")); // Output: "hello universe"
split(): This method splits a string into an array of substrings based on a specified pattern.
let regex = /,/;
console.log("apple,banana,orange".split(regex)); // Output: ["apple", "banana", "orange"]
Common Symbols and Modifiers:
Regular expressions in JavaScript use various symbols and modifiers to define patterns.
Here are some commonly used ones:
'.': Matches any single character except a newline.
[]: Defines a character set and matches any single character within it.
^: Matches the start of a string.
$: Matches the end of a string.
*: Matches zero or more occurrences of the preceding element.
+: Matches one or more occurrences of the preceding element.
?: Matches zero or one occurrence of the preceding element.
|: Acts as an OR operator, allowing multiple alternatives.
\: Escapes a special character or indicates a special sequence.
Example:
Let's say we want to check if a string contains a valid email address. We can use the following regular expression:
let emailRegex = /^\w+([\.-]?\w+)*@\w+([\.-]?\w+)*(\.\w{2,3})+$/;
let email = "example@example.com";
console.log(emailRegex.test(email)); // Output: true
email = "invalid.email";
console.log(emailRegex.test(email)); // Output: false
In the above example,
the regular expression is used to validate an email address. It ensures that the email address is in the correct format, containing a username, @ symbol, domain name, and domain extension.
'^': This symbol represents the start of the string. It ensures that the email address begins with the following pattern.
'\w+': '\w' matches any word character (alphanumeric and underscore), and + indicates that there must be one or more occurrences of the preceding pattern. This represents the username part of the email address.
([\.-]?\w+)*: This part allows for optional dots or hyphens ([\.-]?) followed by one or more word characters (\w+).
The * indicates that this group can occur zero or more times.
This handles the case of a dot or hyphens before the @ symbol in the username part of the email address.
@: This symbol matches the literal "@" character.
\w+: This part matches one or more word characters, representing the domain name.
([\.-]?\w+)*: Similar to point 3, this allows for optional dots or hyphens followed by one or more word characters. It handles the case of a dot or hyphens before the domain name.
(\.\w{2,3})+: This part matches the domain extension, which consists of a dot followed by two or three-word characters. The + indicates that this group can occur one or more times, allowing for subdomains.
$: This symbol represents the end of the string. It ensures that the email address ends with the preceding pattern.
In summary, the regular expression ^\w+([\.-]?\w+)*@\w+([\.-]?\w+)*(\.\w{2,3})+$ checks for the following conditions in an email address:
Starts with one or more word characters for the username.
Allows optional dots or hyphens followed by one or more word characters in the username.
Contains the "@" symbol.
Followed by one or more word characters for the domain name.
Allows optional dots or hyphens followed by one or more word characters in the domain name.
Ends with a valid domain extension consisting of a dot followed by two or three-word characters.
By using the test() method of the regular expression object, we can check if a given string matches this pattern and thus determine if it is a valid email address
another Example:
validationPattern = /^(0|[1-9][0-9]*)$/;
Let's break down the regular expression:
/: The forward slashes at the beginning and end of the regular expression delimit the pattern.
^: This symbol represents the start of the input string.
(0|[1-9][0-9]*): This group is used to define two alternative patterns:
0: Matches the digit 0 exactly once.
[1-9][0-9]*: Matches any digit from 1 to 9 once, followed by zero or more digits from 0 to 9.
$: This symbol represents the end of the input string.
With this regular expression, the pattern will correctly validate whether the input string is either a single zero or a sequence of digits starting from 1 to 9 without leading zeros.
another Example:
Matching Dates:
let dateRegex = /^\d{2}-\d{2}-\d{4}$/;
console.log(dateRegex.test("05-12-2023")); // Output: true
console.log(dateRegex.test("2023-12-05")); // Output: false
In the above example, the regular expression \d{2}-\d{2}-\d{4} matches a date string in the format
"dd-mm-yyyy".
It consists of two digits for the day, followed by a hyphen, two digits for the month, another hyphen, and finally, four digits for the year.
another Example:
Extracting numbers from a string:
let numberRegex = /\d+/g;
let text = "I have 3 apples and 5 oranges.";
console.log(text.match(numberRegex)); // Output: ["3", "5"]
In the above example, the regular expression \d+ matches one or more consecutive digits. The g modifier is used to find all occurrences of the pattern in the given string.
The match() method returns an array of all the matched numbers.
Removing White Spaces:
let whitespaceRegex = /\s+/g;
let sentence = " Hello world! ";
console.log(sentence.replace(whitespaceRegex, " ")); // Output: "Hello world!"
Here, the regular expression \s+ matches one or more consecutive whitespace characters.
The g modifier is used to replace all occurrences of whitespace with a single space.
Validating URL:
let urlRegex = /^(http|https):\/\/[a-z0-9]+([\.-][a-z0-9]+)*\.[a-z]{2,}(:\d{1,5})?(\/.*)?$/;
console.log(urlRegex.test("https://www.example.com")); // Output: true
console.log(urlRegex.test("http://example")); // Output: false
The above regular expression validates the format of a URL.
It checks if the string starts with "http://" or "https://", followed by the domain name consisting of lowercase letters or digits. It allows optional subdomains separated by dots.
The TLD (top-level domain) should consist of at least two lowercase letters. It also handles optional ports and paths.
Regular expressions provide a powerful and flexible way to work with patterns in JavaScript.
They can be used for a wide range of tasks, including validation, searching, and manipulation of strings.
It's worth noting that regular expressions can be complex, and understanding the various symbols and modifiers takes time and practice.