Previous Index Next


6 Patterns

Table of Contents


A regular expression - or regex - is just another way of saying pattern. (Axmud uses these terms interchangeably.)

Many of times a second, Axmud will look at a line of text and ask, 'Does it match this pattern?'

Axmud uses regular expressions all the time, so it's important to know something about them.

There are a million and one tutorials on the internet, but here is another one anyway. It's as short as possible and all the example relate to what you might see in a MUD. (Skip to Section 7 if you already understand regular expressions.)

6.1 Regular expressions

A pattern can be as simple as the word troll. The pattern troll matches all of these lines:

    You see a troll, two dwarves and an elf
    You kill the troll
    There are five trolls here

But it doesn't match any of these lines:

    You see an orc, two dwarves and an elf
    You kill the orc
    You see the Troll

Patterns are usually case-sensitive. The last line above doesn't match the pattern because it contains Troll, and we're looking for troll.

Patterns can be longer than a single word. The pattern kill the orc matches the second line (but not the others).

6.1.1 Metacharacters

Sometimes we need to look for lines that begin with a certain pattern. The caret character ( ^ ) means that this pattern must appear at the beginning of the line.

The pattern ^troll matches both of these lines:

    troll on the floor, bleeding to death
    trolls on the floor, bleeding to death

But it doesn't match either of these lines:

    You see a troll
    There are five trolls here

At other times we need to look for lines which end with a certain pattern. The dollar character ( $ ) means that this pattern must appear at the end of the line.

The pattern troll$ matches both of these lines:

    You see a troll
    You kill the troll

Sometimes we will use both special characters together. The pattern ^You kill the troll$ matches one line, and one line only:

    You kill the troll

Needless to say, the ^ and $ characters should appear only at the beginning/end of the pattern (and not somewhere in the middle).

6.1.2 Matching any text

Very often we'll need to match a line like this:

    You are carrying 500 gold coins

In a pattern, we can use the full stop (period) character to mean any character. For example, the pattern d.g will match all of the following lines:

    dig
    dog
    dug
    dagger
    degree

The character combination .* (a full stop/period followed by an asterisk) is very important. It means any text at all.

So, the pattern You are carrying .* gold coins matches all of the following lines:

    You are carrying 100 gold coins
    You are carrying 500 gold coins
    You are carrying 100000000 gold coins
    You are carrying no gold coins

.* actually means any text, including no text at all. So the same pattern will also match this line:

    You are carrying gold coins.

6.1.3 Escape sequences

We can use a full stop (period) to mean any character, but sometimes you will want to be more specific.

The escape sequence \w - a forward slash followed by the lower-case letter w - means any letter, number or underline (underscore). So, the pattern b\wll matches all of these lines:

    I see a ball
    I see a bell
    I see a bill

But it won't match this line:

    I see a b@ll

... because the @ character is not a letter, a number or an underline (underscore).

The combination \W means the exact opposite - any character except a letter, number or underline (underscore). So, the pattern b\Wll does match this line:

    I see a b@ll

... but it doesn't match these lines:

    I see a ball
    I see a bell
    I see a bill

One more important escape sequence is \s, which means any space character, including tabs. The opposite is \S, which means any character except a space character or a tab.

6.1.4 Quantifiers

Sometimes we'll need a pattern which can match any of these lines:

    You kill the cat
    You kill the caat
    You kill the caaaaaaat

The character combination a+ means 1 or more letter "a" characters. So, the pattern ca+t matches all of lines above, but it doesn't match:

    You kill the kitten

You can use the plus sign ( + ) after any character. For example, \d means a single number character, but \d+ means one or more number characters.

The pattern You have \d+ gold coins matches both of the following lines:

    You have 50 gold coins
    You have 1000000 gold coins

...but it doesn't match this line:

    You have no gold coins

You can also use a question mark ( ? ) after any character. It means zero or one of these characters (but not more). And you've already seen that the asterisk character ( * ) means zero, one or more of these characters.

6.1.5 Substrings

The pre-configured worlds have been set up to look for lines like these:

    You have 100 gold coins

The patterns they use often look like this:

    You have (.*) gold coins

A pair of brackets (braces) means save everything in the middle for later. In this case, we don't just want to recognise a line matching this pattern - we want to store the number of gold coins for later use.

The (.*) combination is one example of a group. The contents of the group - in this case, 100 - is called the substring.

Sometimes we'll need to use several groups on the same line.

    You have (.*) gold, (.*) silver and (.*) brass coins

Because we have three groups, three different substrings are stored for later. These substrings are numbered 1, 2 and 3 (not 0, 1 and 2).

That's the end of the regular expression tutorial. In the next Section we'll go ahead and create some interfaces.


Previous Index Next