Which Iterable Type Should I Use?

Iterables are powerful and lightweight methods for handling collections of objects. Several Haxe datatypes that qualify as Iterable types include Arrays, Hashes, and Lists.

The question many programmers face is which Iterable type to use in a given situation... i.e. which data type loops the fastest? Which uses the least memory? Which allows for flexible expansion/deletion of elements? There is no 'best' data type for any arbitrary task, so it's up to the programmer to figure out which datatypes fit their task the best.

However, Haxe is unique in that the relative performance characteristics of an Iterable can change not only in terms of the task, but also in terms of the target platform. In other words, even though a given Iterable may typically be 'best' at a given task on one platform, this may or may not be the case on other target platforms. Managing this situation without duplicating code is complicated, but possible thanks to Haxe's flexible typing system and conditional compilation options.

Finally, even though Haxe targets a multitude of platforms, tasks, and goals, the development community gives a higher priority to developing fast executing code than virtually any other aspect of code metrics. Techniques like inlining functions can speed method execution up significantly, and this more than outweighs the issue of extra memory needed to duplicate relevant bytecode several times in the executable. With this in mind, most developers want to know... "What datatype is the fastest to iterate?" However, there's special consideration for implementing Iterables in a few of the Haxe targets... most notably Adobe's SWF virtual machine.

Arrays and Linked Lists


Most of the options for Iterables boils down to whether you are implementing an array, or linked list. The well-worn arguments for and against each type won't be repeated here. Instead, it is worth discussing how the platforms differ in terms of their Iterable performance, and how Haxe can be used to flexibly implement "best performing" Iterables using conditional compilation.

SWF: The White Elephant


Normally, arrays are the default method for storing collections of elements. Only in exotic or specialized contexts will linked lists be used. However, the SWF virtual machine has an untyped array implementation as noted here (under Array). This means that arrays are not sized optimally according to the data that they contain, and as such, are slower and larger than they should be.

Nicolas discusses this in more detail on his blog, and comes to the conclusion that linked-list types are much faster at iteration than arrays in SWF. For most people, the extra overhead of the linked-list datastructure is well worth the faster execution of the resulting code. So, Lists seem to be a solid choice for developing speedy Flash applications, and perhaps more developers should use the List datatype available in Haxe.

Not So Fast, List


However, the Haxe implementation of List uses arrays to store elements. Unfortunately, it is still one of the slowest iterating datastructures in SWF. The List datastructure uses those arrays to save a bit of space. This makes them more efficient on other platforms, but a poor all around choice on the Flash platform.

FastList: Appropriately Named


Nicolas realized that Typed field accesses were the fastest possible lookup in the SWF virtual machine. This led him to create the FastList datatype, which stores the linked elements using field objects, rather than arrays. As predicted, FastList thoroughly beats every other datatype when it comes to iteration speed.

List and FastList: Irreconcilable Differences


Several other Haxe developers were interested in reconciling the two list types. Could a typedef possibly switch between two different List types at compile time? That way, the lighter array based List could be used in php, js, etc., while the field object based FastList could be used on Flash.

Unfortunately, this was not possible either. It is currently impossible to override the base List implementation in Haxe. Furthermore, FastList and List differ greatly in the behavior of their default add() and pop() methods. While List add()s and pop()s from the tail of the list, FastList add()s and pop()s from the head of the list. Trying to unify them both with a simple typedef switch would be an exercise in confusion.

Other Approaches: SugarList


The original author of this article has come up with an approach at resolving the List issue in the Flash environment. While it is still not complete, it is in a workable state suitable for experimentation.

A SugarList extends a List in a normal fashion. However, then it proceeds to declare field object link references (like FastList), and override every base List method so that all the methods work identically. Since it is an extension, SugarList will work everywhere a normal List would, except that it will be faster to iterate on the SWF platform.

This is only part of the solution though. The general purpose Lambda methods are typically used to manipulate Lists. But, once Lambda is used on SugarList, it will create the slower base List class as a result.

What was needed was a new Static utility class that implemented Lambda functions for the newer, faster, SugarList class. So, ListTools was created as a way to manage List based manipulations... both for SugarList and List classes. ListTools is a package of methods used for creating lists from different types of Iterables. However, ListTools has an optional compiler argument -D sugarlist that will change the list result types from the base List class to the new SugarList class.

Arrays on Other Haxe Targets


Nicolas also discussed Array specifics on other platforms:
  • Usually, an Array needs to copy the whole structure when adding/removing and element (which is O(n)) whereas a List has O(1) add.
  • Sadly, there is no way to have "flat Arrays" in JS/Flash so they actually implemented has Hashtables natively, with different complexity of accesses.
  • In Neko, we use a fast growing Array so we can reach a O(1) add, at the expense of more memory usage. If you often add/remove elements, Neko Lists are better, whereas if you have a given static structure, Array is better.

He prefers List because you can .remove() items on it while iterating without skipping any item (which is not the case for Array, since it's index-based).

Stax


The Stax library implements head-tail based lists which can be found in most functional languages.
This means you never have to copy a list. Adding the first element or dropping it is always O(1) - but adding or removing the last element is always O(n) (where n is the list size - because all head-tail items have to be duplicated which actually clones the list)

fastest iterators (> 10 elements, Exception based)


Depends on your use case and target. You can think about many different implementations than found in Std.
The Python style Exception based iterators are fastest in most cases if you have more than 10 elements. Using them only pays off because catching the final Exception can be quit expensive - worse: It may depend on stack depths. However you almost always have a stack of less than 50 calls which doesn't seem to hurt too much. What's the difference?
  • no intermediate structures when using .map(..).filter(...).map(...)
    So be careful.
      var x = list.toEiter.map(x) 
     

    will not execute any code. So you may want to consider doing this in some cases:
      var x = list.clone().toEiter.map(x); // copy the list because .map will not do so
     
  • only one function is used ever. The function is the iterator itself.
    (Eg compare to the Std implementation which has iter.next() and iter.hasNext())
  • shortest and most readable code because you only have to implement one function and you usually have to care about the end-of-iterator case only once using one try .. catch

One implementation can be found here:
http://github.com/MarcWeber/haxe-std-tiny/blob/master/src/tiny/EIterExt.hx

version #15754, modified 2012-12-02 11:03:20 by ppelleti