Validating an Xml Structure

XML is a way to store and transfer data in an structured manner. Such, validating this data can be paramount for the reliability of any process or application. XML must be well formed and valid. Well formed means following the syntax rules for the language, like closing opened tags, writing everything in lowercase, etc. XML and Using haxe.xml.Fast will already take care of that validation, as they will throw an exception if you feed them not-well formed XML. However, well formed is not enough. Sometimes you need to make sure a given xml follows a predefined structure, and that values in that xml meet certain criteria. In this cases, you can use the haxe.xml.Check API to ensure that an XML respects a give structure.

Currently, the most common way to validate an XML structure are DTD and Schema. As everything in life, both have pros and cons. DTD is simple to learn and use, but is limited to define the structure of the xml document, not the data it carries. Schema is far more powerful, and is based in XML itself, but this power comes at the price of an increased complexity.

Haxe takes a third approach, more like a middle ground, defining a custom method for validating XML structure. Why reinvent the wheel you ask?. In the words of Nicolas Cannasse, because "It's more powerful than DTD which is very limited, and less complex than Schema which is a mess."

Well, enough talk, to the tutorial!

My first validation

haxe.xml.Check is quite simple yet very powerful. However, if you have never used DTD or Schema, it may be a little confusing at first. The concept is simple, you define a set of rules, and validate your XML againsts those rules. for example, supose you have the following xml document stored in a variable of type xml called myXml:

var str:String ='<myElement><myChildElement>myData</myChildElement></myElement>';
var myXml:Xml = Xml.parse(str);

You want to make sure that <myElement> always have a child element called myChildElement, and that the child element of myChildElement is plain text. For this, you first declare a variable with type haxe.xml.Rule like this:

var myRule:haxe.xml.Rule;

Then, you assign it a set of rules like this:

myRule = RNode('myElement',[],RNode('myChildElement',[],RData()));

Then apply the check with:

haxe.xml.Check.checkDocument(myXml,myRule);

Wowowowo, that was too quick you say?, then let me explain:

If you check the Rule, you will see that the class haxe.xml.Rule is not a class, it is an enum so do not forget to import the class where it stands otherwise compiler will complain:

import haxe.xml.Check;

This enum consist of several constructors, but let's focus in the ones we are using right now, RNode and RData.

Understanding Rules

Let's explain all the possible rules one-by-one...

RNode

RNode validates a single element (node). As you can see, RNode takes one mandatory parameter and two optional parameters. The first parameter is name, the name of the node that you are validating. Following the previous example, the name of the node is myElement.

The second and third parameters are an array of attributes, and another rule (kind of recursive, isn't it?). Since this node has no attributes, the second parameter is an empy array represented by []. The third parameter takes a Rule, and this is where the good part comes in, it can be any other rule. In this case, the child node of myElement is another node, called myChildElement, so we define this parameter as another RNode, with it's own parameters, the name myChildElement, an empty array of attributes [], and a child of type RData.

Another example, to validate a document with a single node:

// A node called 'myNode', with no attributes and no children.
var testRule = RNode('myNode'); //
// This one will pass validation;
var testXml = Xml.parse('<myNode/>');
haxe.xml.Check.checkDocument(testXml,testRule);
// This one will NOT pass validation, as it has an attribute:
var test2Xml  = Xml.parse('<myNode attribute="something"/>');
haxe.xml.Check.checkDocument(test2Xml,testRule); // Exception will be thrown
// This one will neither pass, as it has a child.
var test3Xml = Xml.parse('<myNode>Some data</myNode>');
haxe.xml.Check.checkDocument(test3Xml,testRule); // Exception will be thrown

So, if we'd want a Rule that test2Xml validates we would need to add that second parameter: an Array<Attrib>.

As we can see the Attrib's constructor takes one required param : the attribute's name\\ and two optionals : a filter (will cover that later on) and the default value (no need for now).

    var test2Rule = RNode('myNode',[Attrib.Att("attribute")]);

ROptional

The rule can or cannot apply :

ROptional(RNode("child_e",[],RData()));

this rule says there can be zero or one node named child_e that should contain data, but with no attributes.

RChoice

Choose between one of these rules :

var choice1 = RNode("child_c",[],RData());
var choice2 = RNode("child_d",[],RData());
var myRule = RChoice([choice1,choice2]);

This one says there can be either a child_c or a child_d node.

RList

Validate all those rules in proper order or not (second argument)

var myRule = RList([
            RNode("child_a"),
            RNode("child_b"),
            RNode("child_e")
        ],
        false
    );

This rules says there should be three node elements child_a, child_b, child_e that could appear in various order.

RMulti

Can apply many times, with optional at least once (second argument).

var myRule = RMulti(RNode("my_element",[],RData()),true);

this rule says there can be as many my_element nodes as you wish but a least one.

RData

As you can see, RData takes a single optional parameter. RData means plain data, plain text inside the xml. The single parameter is a Filter, a powerful type for validation. There are several filters, but mark my words, the most used will be FReg, wich validates text against a regular expresion. But I'm going too quick, let's get back to the matter at hand.

Validation Explained

Basically, you are telling haxe to validate your xml with a rule that in plain english will read something like:
"My xml should have a node called myElement, with no attributes, and another node called myChildElement child of the previous one and also with no attributes. The child of this child element should be plain text". Pretty easy, uh?

Filtering Data

So far we have not done anything exceptional yet, but lets get messy. Maybe you want the data in myChildElement formed of only numbers, maybe is a part number an you want to make sure it only contains 3 digits, no more and no less. For this we rewrite the rule as:

  myRule = RNode('myElement',[],
              RNode('myChildElement',[],RData(FReg(~/^[0-9]{3}$/))));

What we have done is to give a parameter to the RData rule, a filter, more specifically, a Regular Expression filter. (If you don't know what a regular expresion is, you should get informed, as is the coolest thing since sliced bread). The FReg filters takes a parameter, the regular expresion that says that only 3 digits can be used (Be careful of closing all the parentheses that you opened). Now if you write anything diferent than 3 digits in myChildElement and try to validate it with haxe.xml.Check.checkDocument you will get an exception. FReg is a powerful way to ensure the data follows a give format, as regular expresions can validate pretty much anything. You want a MD5 sum on it? (a 32 digit hexadecimal number), then use :

  FReg(~/^[A-Fa-f0-9]{32}$/) 

as the parameter for your RData rule and you're set. Not very good with regular expresions? There are a lot of prefabricated regexs on the web to validate pretty much anything.

So far, I'll only cover this, but be waiting for more in the following days.

Full example

Here's an example of a simple XML Check with all rules and filters excepted FReg explained above.
As detailed exception are thrown when document does not validates you can edit it or the rules to test it thoroughfully.

my_xml.xml :

<?xml version="1.0" encoding="UTF-8"?>
<first_element 
        first_attribute_one = "something" 
        first_attribute_two = "something else">
    <child_one child_attribute="something more">
        <child_a>true</child_a>
        <child_b>123</child_b>
        <child_c>Hello</child_c>
        <child_e>haxe me baby!</child_e>
    </child_one>
    <child_one child_attribute="some other thing">
        <child_a>false</child_a>
        <child_b></child_b>
        <child_d>world</child_d>
    </child_one>
</first_element>

the Haxe file (Main.hx) :

import haxe.xml.Check;
class Main{
function onComplete(the_data:String)
{
    /********** Da rule (FUN!) ********************/
    // either child_c or child_d
    var choice1 = RNode("child_c",[],RData());
    var choice2 = RNode("child_d",[],RData());
    // a list of childs
    var childs2 = RList(
        [
            // Node element child_a should contain a boolean
            RNode("child_a"    ,[],RData(FBool)),
            //child_b can be empty and if not should contain an Integer
            RNode("child_b"    ,[],ROptional(RData(FInt))),
            // choices 
            RChoice([choice1,choice2]),
            // should be a child_e or not
            ROptional(RNode("child_e",[],RData()))
        ],
        // nodes should appear in proper order
        true 
    );
    // child_one rule
    var childs1 = RNode("child_one",
        [Attrib.Att("child_attribute")],
        childs2
    );
    // first_element can have one or many child_one
    var mChilds = RMulti(childs1,true);
    // the values attributes can have
    var attVal = FEnum(["something",
        "something else",
        "some thing new"
    ]);
    // the list of first_element attributes
    var attribs = [
        Attrib.Att("first_attribute_one",attVal),
        Attrib.Att("first_attribute_two",attVal)
    ];
    // the first element's rule (the main one)
    var daRule = RNode("first_element",attribs,mChilds);
    /**************************************/
    // parse the string and get the first element 
    var first_element = Xml.parse(the_data).firstElement();
    try {
        //check the rule on the node as we got the first element
        haxe.xml.Check.checkNode(first_element,daRule);
        trace("Cool ! ");
    } catch (m:String) {
        trace("NOT Cool ! "+m);
    }
}
//////////////// lets get that file (no fun...) ////////////
private var _call : haxe.Http;
public function new ()
{
    // Da URL where da XML is waiting for you
    var service = "http://localhost/testHaxeXMLcheck/my_service.xml";
    _call = new haxe.Http(service);
    _call.onData = onComplete;
    _call.onError = onError;
    _call.request(false);
}

function onError(msg:String) {
    trace("ERROR "+msg);
}

static function main() {
    var m:Main =new Main();
}
////////////////////////////////////////////////////////////
}

the compiler file .hxml :

-swf index.swf
-swf-version 9
-main Main

Compile, run, have fun !

version #8135, modified 2010-02-15 13:11:01 by mpe