Regular expressions, commonly known as Regex or Regexp, are popularly used tools in the world of programming and text processing. A regular expression is a sequence of characters that forms a search pattern. These patterns are used to search, extract or manipulate text according to specific rules.
What is a regular expression?
The regex (in English, regular Expressions) are the units of description of regular languages, which are included in the so -called formal languages. They are a key instrument of theoretical computer science, which, among other things, establishes the basis for the development and execution of computer programs, as well as for the construction of the compiler necessary for it. This is why regular expressions, also called regx and based on clearly defined syntactic rules, are mainly used in the field of software development.
For each regex there is a so -called finite automaton (also known as the finite state machine) that accepts the language specified by the expression and that, with the help of the construction of Thompson, develops from a regular expression. On the other hand, for each finite automaton there is also a regular expression that describes the language accepted by the automaton. This can be generated well with the Kleene algorithm or with the elimination of states.
An emblematic example of the use of regular expressions in information technology is found in the "search and replace" function of text editors. This functionality, first implemented in the 1960s by Ken Thompson (Go Co-Creator), an outstanding figure in the computer science and one of the developers of the UNIX operating system, had its beginning in the text editor By line that was extended later to its successors.
The search and replace function based on regular expressions allows users to look for specific character sequences in a text and, if they wish, replace them with another character sequence. This not only simplifies the text editing and manipulation process, but also provides a powerful tool to make massive and precise changes in documents.
For example, suppose a user needs to consistently correct the spelling of a poorly written word in an entire document. Using regular expressions, they could seek all the instances of the word badly written and replace them simultaneously with the correct spelling, instead of manually making corrections one by one.
This approach not only saves time and effort, but also allows complex text handling tasks efficiently and uniformly. Ken Thompson's ability to implement this functionality in the first text editors marked the beginning of the integration of regular expressions in fundamental computer tools and has endured as an essential technique in the development and administration of software to this day.
How does a regular expression work?
A regular expression can be formed, or exclusively by normal characters (such as ABC), or by a combination of normal characters and metacaracteres (such as ab*c). Metachacters describe certain constructions or characters provisions: for example, if a character should be at the beginning of the line or if a character should only or may appear exactly once, more or so. Both examples of regular expressions work, for example, as follows:
ABC. The simple regx pattern ABC requires an exact coincidence. Therefore, character chains will be sought that not only contain the “ABC” characters, but also appear in that order. A question like "Do you know ABC square?" It offers the coincidence sought by this expression.
ab*c. Regular expressions with special characters work differently, since not only exact coincidences will be sought, but also special scenarios. In this case, the asterisk causes the search to focus on characters of characters that begin with the letter "A" and end up with the letter "C" and in between have any number of characters "B". Thus, both “ABC”, as the “ABBBBC” and “CBBABBCBA” characters will be shown as coincidence.
What are the syntactic rules applicable to regular expressions?
Regular expressions can be used in a variety of languages, such as Perl, Python, Ruby, JavaScript, XML or HTML, and their uses and functions may vary significantly. For example, in JavaScript, regx patterns are used in chain methods such as Search (), Match () or Replace (), while in XML documents, regular expressions are used to restrict content elements. Despite these differences in use, in terms of syntax, there is a remarkable uniformity between the various programming languages and marked languages with regard to regular expressions.
Thus, a regular expression can be formed by up to three parts, regardless of the language in which it will be used:
Pattern (Search pattern) | The central element is the pattern, that is, the general search pattern. As we have explained before, it can be formed from simple characters or from a combination of simple and special characters. |
---|---|
Delimiter (delimiter) | The beginning and end of the pattern are identified with delimitors. Delimitors are basically all non -alphanumeric characters (except the reverse diagonal bar). For example, for PHP the pads (#Pattern#), the percentage signs (%Pattern%), the most sign (+pattern+) or the tildes (~ pattern ~) are delimitors. Most languages already use quotes ("pattern") or diagonal bars (/pattern/). |
Modifier (modifier) | Modifiers can be added to a search pattern to modify regular expression. An example is modifier I, which cancels the distinction between upper and lower case. It guarantees that capital and lowercase are taken into consideration and that they are worth by default for all regular expressions. |
Next, we list some of the special syntax characters that patterns can expand with specific options:
Special Syntax Characters | Function |
---|---|
[] | The square brackets identify a character class that always represents a single character in a search pattern. |
() | The parenthesis identify a group of characters formed by one or more characters and that can operate within the others. |
- | It works as a specification of the area (of […] up to […]) when it is between two normal characters. |
^ | It limits the search to the start of a line (another function: denial element in characters classes). |
$ | Limit the search at the end of a line. |
. | It is equivalent to any character. |
* | The character number, of the class or the group located before the asterisk can be random (zero included). |
+ | The character, class or group before one more sign must appear at least once. |
? | The character, class or group before the interrogation sign is optional and may appear at most once. |
{n} | The previous character, class or group appears exactly n times. |
{n,m} | The previous character, class or group appears at least n times and at m most times. |
{n,} | The previous character, class or group appears at least n times or frequently. |
\b | Takes into account the word limit during the search. |
\B | Ignore the word limit during the search. |
\d | Any digit; Abbreviation for characters [0-9]. |
\D | Any non -digit; Abbreviation for characters [^0-9]. |
\w | Any alphanumeric character; Abbreviation for character class [A-Z_0-9]. |
\W | Any non -alphanumeric character; Abbreviation for character class [^\ W]. |
Examples of use:
Email format:
REGEX EXAMPLE:
^[a-za-z0-9 ._%+-]+@[a-za-z0-9 .-]+\. [a-za-z] {2,} $
This regular expression validates an email address format.
Search for telephone numbers:
REGEX EXAMPLE:
\ d {3}-\ d {3}-\ d {4}
Find telephone numbers in the XXX-XXX-XXXX format.
HTML label extraction:
REGEX EXAMPLE:
<\ s*([a-z]+) ([^>]*)>
Extract HTML labels and its attributes of a text chain.
Areas of use:
Data validation: Regular expressions are essential to validate specific formats, such as email addresses, telephone numbers or postal codes.
Data search and extraction: They are widely used to search and extract specific information from data sets or documents.
Text handling: Text handling facilitating when performing replacements, eliminations or transformations based on defined patterns.
Log analysis and text files: In systems development and administration environments, they are used to analyze records and text files in search of specific patterns.
Software development: Regular expressions are valuable tools in software development to perform efficient searches and manipulation of text chains.
In conclusion, regular expressions are a versatile and powerful tool in the field of programming and text processing. By understanding and dominating their syntax, developers can perform complex search, validation and manipulation of data efficiently and precisely.