Skip to content

Regular expressions

3.2.1. Introduction

A regular expression, or regex, is a sequence of characters that forms a search pattern. This pattern can find other strings.

Regexes have their origins in theoretical computer science and are based on mathematical principles. If you would like to learn more about them, there are many resources on the web that offer detailed explanations. The Wikipedia article for regular expressions is a good place to start.

3.2.2. Basic regex syntax

We list the basic regex metacharacters and illustrate the basic syntax of regexes by way of example strings.

Metacharacters

There is a set of characters that are used in regexes as operators. These characters are called metacharacters.

If you don’t want the regex engine to interpret them as metacharacters, e.g. because you want to find one in a string, you need to “escape” them. This means preceding the metacharacter with a backslash \, i.e. \$.

Metacharacter Description
. The period is a placeholder for any single character
^ Matches the start of a string
$ Matches the end of a string
| Functions as an either-or operator
? Matches the preceding element zero or one time
+ Matches the preceding element one or more times
* Matches the preceding element zero or more times
{ } Matches the preceding element the specified number of times
( ) Defines a subexpression
[ ] Defines a bracket expression that may contain a set of characters that other metacharacters can be applied to

Examples

  • Single-element expressions

[1234567]

​ - Matches a single character that is contained in the bracket expression, i.e. a number between 1 and 7 here.

​ - Alternative expression: [1-7].

[Max]

​ - Matches any single letter contained in the bracket expression, i.e. “M”, “a”, or “x”.

​ - Important: Does not match the word “Max”.

[1-35-8]

​ - Matches the numbers 1, 2, 3, 5, 6, 7, 8.

​ - Does not match the number 35.

​ - Multi-element expressions.

[1-9]

Matches any combination of the bracket expressions, i.e. “1a”, “1b”, “2a”, “2b”, …., “9a”, “9b”.

  • Quantifiers

The quantifier metacharacters ?, +, *, and {} are placed after the characters they should apply to.

[1-9]?

​ - Matches a string that contains any of the numbers between 1 and 9 either zero or one time.

​ - (Colou?r) would therefore match both “Colour” and “Color”.

[1-9]+

​ - Matches any number consisting of one or more numbers between 1 and 9.

​ - In other words, matches any number greater than 0.

[0-9]{5}

​ Matches any five-digit number.

[0-9]{3,}

​ Matches any number with at least three digits.

[0-9]{3,5}

​ - Matches any number with at least three and at most five digits.

​ - Match beginning of string

^Image_

​ Matches all strings that start with “Image_”

(^Image_)[0-9]{3}

​ Matches all strings start with “Image_” followed by a three-digit number.

​ Examples for matched strings: “Image_001”, “Image_999”, “Image_127”, …

(^Image_)[0-9]{3}(.jpg)

​ - Matches all strings that start with “Image_” followed by a three-digit number and “.jpg”.

​ - Note the period character is escaped with a backslash.

  • Match end of string

Sample$

​ Matches all strings that end in “Sample”.

(Sample[0-9]{3}$)

​ Matches all strings that end in “Sample” followed by a three-digit number.

  • Placeholders

Image.

​ - Matches all strings that consist of “Image” followed by any number of other characters.

​ - . is the placeholder for any character.

​ Examples for matched strings: “Image3459834059346237832jkhdsdb”, “Image”, “ImageTheCat”

  • Match alternatives

((G|g)r(a|e)y)

​ Matches the strings “Gray”, “gray”, “Grey”, and “grey”.

  • Exclude characters

With [^ you can exclude characters.

[^0-9]

​ Excludes all numbers. Therefore, matches any string that doesn’t contain any numbers.

3.2.3. Regex applied in akaAT Studio

Extract information with a Get text action and a regex

The Get text action allows you to extract text. Often, you may only want to extract a certain part of a text. This can be done with a regex.

image-20220606131251581

Note: this number is extracted and assigned to the output variable.