Using Regular Expressions

The regular expression api can be found here : EReg.
Haxe has builtin support for Regular Expressions. They can be used to verify the format of a string or extract some regular data from a given text. A regular expression starts with ~/ and ends with a single / :

    var r : EReg = ~/world/;
    var str = "hello world";
    trace(r.match(str)); // true : 'world' was found in the string
    trace(r.match("hello !")); // false

You can use standard Regular Expressions patterns such as (not exclusively) :

  • . : any character
  • * : repeat zero-or-more
  • + : repeat one-or-more
  • ? : optional zero-or-one
  • [A-Z0-9] : character ranges
  • [^\r\n\t] : character not-in-range
  • (...) : parenthesis to match groups of characters
  • ^ : beginning of the string (beginning of a line in multiline matching mode)
  • $ : end of the string (end of a line in multiline matching mode)
  • | : "OR" statement.

For example, the following regular expression match a valid email address :

    ~/[A-Z0-9._%-]+@[A-Z0-9.-]+\.[A-Z][A-Z][A-Z]?/i;

Please notice that the i at the end of the regular expression is a Flag that enable case-insensitive matching.

The possible flags are the following :

  • i : case insensitive matching
  • g : global replace or split, see below
  • m : multiline matching, ^ and $ represent the beginning and end of a line
  • s : the dot . will match also newlines (Haxe/Neko, Haxe/C++, Haxe/PHP and Haxe/Java only)
  • u : use utf8 matching (Haxe/Neko and Haxe/C++ only)

Groups

You can extract some informations by using groups :

   var str = "Nicolas is 26 years old";
   var r = ~/([A-Za-z]+) is ([0-9]+) years old/;
   r.match(str);
   trace(r.matched(1)); // "Nicolas"
   trace(r.matched(2)); // "26"

The r.matched(0) result will always return the whole matched substring, and r.matchedPos() will return the position of this substring in the original string :

   var str = "abcdeeeeefghi";
   var r = ~/e+/;
   r.match(str);
   trace(r.matched(0)); // "eeeee"
   trace(r.matchedPos()); // { pos : 4, len : 5 }

Replace

A regular expression can also be used to replace a part of the string :

   var str = "aaabcbcbcbz";
   var r = ~/b[^c]/g; // g : replace all instances
   trace(r.replace(str,"xx")); // "aaabcbcbcxx"

You can use $X to reuse a matched group in the replacement :

   var str = "{hello} {0} {again}";
   var r = ~/{([a-z]+)}/g;
   trace(r.replace(str,"*$1*")); // "*hello* {0} *again*"

Split

A regular expression can also be used to split a string into several substrings. In that case, the delimiter used to split is not a constant string but a regular expression :

  var str = "XaaaYababZbbbW";
  var r = ~/[ab]+/g;
  trace(r.split(str)); // ["X","Y","Z","W"]

Implementation Details

Regular Expressions are implemented :

  • in Javascript, the Browser is providing the implementation with the object RegExp.
  • in Neko and C++, the PCRE library is used
  • in Flash9, the native implementation is used
  • FIXME in Flash 6/8, the implementation is not yet available but will a pure Haxe version (hence very slow since it's not native, but compatible)

version #15783, modified 2012-12-10 02:41:50 by TheHippo