Regular expressions
3.2.1. Introduction
A regular expression, or regex, is a sequence of characters that forms a search pattern. This pattern can find other strings.
Regexes have their origins in theoretical computer science and are based on mathematical principles. If you would like to learn more about them, there are many resources on the web that offer detailed explanations. The Wikipedia article for regular expressions is a good place to start.
3.2.2. Basic regex syntax
We list the basic regex metacharacters and illustrate the basic syntax of regexes by way of example strings.
Metacharacters
There is a set of characters that are used in regexes as operators. These characters are called metacharacters.
If you don’t want the regex engine to interpret them as metacharacters, e.g. because you want to find one in a string, you need to “escape” them. This means preceding the metacharacter with a backslash \, i.e. \$.
Metacharacter | Description |
---|---|
. | The period is a placeholder for any single character |
^ | Matches the start of a string |
$ | Matches the end of a string |
| | Functions as an either-or operator |
? | Matches the preceding element zero or one time |
+ | Matches the preceding element one or more times |
* | Matches the preceding element zero or more times |
{ } | Matches the preceding element the specified number of times |
( ) | Defines a subexpression |
[ ] | Defines a bracket expression that may contain a set of characters that other metacharacters can be applied to |
Examples
- Single-element expressions
[1234567]
- Matches a single character that is contained in the bracket expression, i.e. a number between 1 and 7 here.
- Alternative expression: [1-7].
[Max]
- Matches any single letter contained in the bracket expression, i.e. “M”, “a”, or “x”.
- Important: Does not match the word “Max”.
[1-35-8]
- Matches the numbers 1, 2, 3, 5, 6, 7, 8.
- Does not match the number 35.
- Multi-element expressions.
[1-9]
Matches any combination of the bracket expressions, i.e. “1a”, “1b”, “2a”, “2b”, …., “9a”, “9b”.
- Quantifiers
The quantifier metacharacters ?, +, *, and {} are placed after the characters they should apply to.
[1-9]?
- Matches a string that contains any of the numbers between 1 and 9 either zero or one time.
- (Colou?r) would therefore match both “Colour” and “Color”.
[1-9]+
- Matches any number consisting of one or more numbers between 1 and 9.
- In other words, matches any number greater than 0.
[0-9]{5}
Matches any five-digit number.
[0-9]{3,}
Matches any number with at least three digits.
[0-9]{3,5}
- Matches any number with at least three and at most five digits.
- Match beginning of string
^Image_
Matches all strings that start with “Image_”
(^Image_)[0-9]{3}
Matches all strings start with “Image_” followed by a three-digit number.
Examples for matched strings: “Image_001”, “Image_999”, “Image_127”, …
(^Image_)[0-9]{3}(.jpg)
- Matches all strings that start with “Image_” followed by a three-digit number and “.jpg”.
- Note the period character is escaped with a backslash.
- Match end of string
Sample$
Matches all strings that end in “Sample”.
(Sample[0-9]{3}$)
Matches all strings that end in “Sample” followed by a three-digit number.
- Placeholders
Image.
- Matches all strings that consist of “Image” followed by any number of other characters.
- . is the placeholder for any character.
Examples for matched strings: “Image3459834059346237832jkhdsdb”, “Image”, “ImageTheCat”
- Match alternatives
((G|g)r(a|e)y)
Matches the strings “Gray”, “gray”, “Grey”, and “grey”.
- Exclude characters
With [^ you can exclude characters.
[^0-9]
Excludes all numbers. Therefore, matches any string that doesn’t contain any numbers.
3.2.3. Regex applied in akaAT Studio
Extract information with a Get text action and a regex
The Get text action allows you to extract text. Often, you may only want to extract a certain part of a text. This can be done with a regex.
Note: this number is extracted and assigned to the output variable.