Abstract Syntax Tree and type.ml

Common.context.types contains the AST of the code. It's a list of module_type instances containing all defined modules. There are three kinds of modules: Classes, Enums and TypeDefs:

module_type =
    | TClassDecl of tclass
    | TEnumDecl of tenum
    | TTypeDecl of tdef

As you can see, each module_type construct has its own type, which contains the definition of the type

tclass = {
    cl_path : path;
    cl_pos : Ast.pos;
    cl_doc : Ast.documentation;    
    cl_private : bool;
    mutable cl_kind : tclass_kind;
    mutable cl_extern : bool;
    mutable cl_interface : bool;    
    mutable cl_types : (string * t) list;
    mutable cl_super : (tclass * tparams) option;
    mutable cl_implements : (tclass * tparams) list;    mutable cl_fields : (string , tclass_field) PMap.t;
    mutable cl_statics : (string, tclass_field) PMap.t;
    mutable cl_ordered_statics : tclass_field list;    mutable cl_ordered_fields : tclass_field list;
    mutable cl_dynamic : t option;
    mutable cl_constructor : tclass_field option;
    mutable cl_init : texpr option;
    mutable cl_overrides : string list;

tclass (as tenum and tdef do) contains all the type's information: inherited classes, implemented interfaces, fields (properties and methods), static fields, genericity information, etc...

Other type definitions from type.ml

path = string list * string is the path of the type. For example, the class mypack1.mypack2.MyClass has a path ["mypack1","mypack2"], "MyClass".

Ast.pos and Ast.documentation are position and documentation info.

tclass_kind contains information about the class such as genericity.

t represents the type of an expression in the AST. tparams is just a list of t.

type t =
    | TMono of t option ref
    | TEnum of tenum * tparams
    | TInst of tclass * tparams
    | TType of tdef * tparams
    | TFun of (string * bool * t) list * t
    | TAnon of tanon
    | TDynamic of t
    | TLazy of (unit -> t) ref

TEnum, TInst, TType are probably the easiest to understand. The parameters are the type itself and its type parameters (class A<Ta> will be TInst with the type of A as the first parameter and Ta as the only element of the tparams list).

TFun is the type of a function. Arguments are easy to understand, a list of arguments (for each of them you have the string name, optionality and type) and the return type. This type could cause some confusion because of the existence of TFunction which has complementary purposes. TDynamic is the type of a Dynamic expression with its optional type parameter.

TLazy is used during type inference. To resolve the final type you can call the follow function.

TAnon is an anonymous expression like typedef(s). tanon contains the field definitions and the a_status value of type anon_status, described later. It's also used to describe static types for both classes and enums.

The type used to define properties and methods is tclass_field:

tclass_field = {
    cf_name : string;
    mutable cf_type : t;
    cf_public : bool;
    cf_doc : Ast.documentation;
    cf_get : field_access;
    cf_set : field_access;
    cf_params : (string * t) list;
    mutable cf_expr : texpr option;

It contains the name (cf_name), type (cf_type), visibility (cf_public), documentation (cf_doc), parameters (cf_params) and content (cf_expr) of the field (if such field is a method).

Besides NormalAccess, the most important is MethodAccess, which defines the getter (cf_get), or the setter (cf_set) of the field.

type field_access =
    | NormalAccess
    | NoAccess
    | ResolveAccess
    | MethodAccess of string
    | MethodCantAccess
    | NeverAccess    
    | InlineAccess


texpr defines an expression written in Haxe, such as "var x = 1", "arr[5]" or "trace(hello hello!)".

texpr = {
    eexpr : texpr_expr;
    etype : t;
    epos : Ast.pos;

Each expression is defined by its type (etype), position in the text (epos) and its content (eexpr), of type texpr_expr, which represents the actual expression.

texpr_expr defines all possible expressions available in Haxe: declarations, assignations, calls, binary and unary operations, arrays, flow control and exceptions.

and texpr_expr =
    | TConst of tconstant
    | TLocal of string
    | TEnumField of tenum * string
    | TArray of texpr * texpr
    | TBinop of Ast.binop * texpr * texpr
    | TField of texpr * string
    | TTypeExpr of module_type    
    | TParenthesis of texpr    
    | TObjectDecl of (string * texpr) list
    | TArrayDecl of texpr list
    | TCall of texpr * texpr list
    | TNew of tclass * tparams * texpr list
    | TUnop of Ast.unop * Ast.unop_flag * texpr
    | TFunction of tfunc
    | TVars of (string * t * texpr option) list
    | TBlock of texpr list    
    | TFor of string * t * texpr * texpr
    | TIf of texpr * texpr * texpr option
    | TWhile of texpr * texpr * Ast.while_flag
    | TSwitch of texpr * (texpr list * texpr) list * texpr option
    | TMatch of texpr * (tenum * tparams) * (int list * (string option * t) list option * texpr) list * texpr option
    | TTry of texpr * (string * t * texpr) list
    | TReturn of texpr option
    | TBreak
    | TContinue
    | TThrow of texpr

The biggest challenge of the compiler is to generate "destination" source for each one of these expressions.

As you can see most of them are recursive: Expressions are composed of other expressions. You'll realize that the function gen_expr is recursively called to generate any of these expressions.

I recommend the lecture of the chapter Pattern matching (on datatypes) from the Ocaml tutorial. It introduces a recursive generator of arithmetical expressions, which is a small example of what the compiler (gen_expr function) actually does.

Other constructs

One of the constructs of type t is tanon, which has anon_status type, and represents anonymous objects.

and anon_status =
    | Closed
    | Opened
    | Const
    | Statics of tclass
    | EnumStatics of tenum

Closed, Opened and Const constructs are used internally by the typer. Don't worry about it at this moment. An anonymous object is Opened while it's not fully typed (we can still add some fields to it). It then becomes Closed when the type is finalized.

Const means that we have a constant expression like { x : 0, y : -1 }, it enables us to tell that ".... has extra field y" in the case only 'x' is required.

The two other might be more interesting. Statics means that it represents a class statics member. For instance :

class Test {
    public static var MYVAR : String;

will be typed as a TAnon which contains MYVAR field with status being Statics "class Test". It's then wrapped with a Typedef "#Test"

Same for an enum E : it's using EnumStatics instead.

Functions are very self-explicative:

and tfunc = {
    tf_type : t;    
    tf_args : (string * tconstant option * t) list;    
    tf_expr : texpr;

A function has a return type (tf_type), a list of parameters (tf_args) and a bunch of instructions as a body (tf_expr). Parameters have a name, a type and can be optional.

«« About Haxe sources - Development Environment »»

version #19737, modified 2013-09-08 19:20:40 by JLM