String Encoding

String encoding might depend on the platform, please read below for more informations :

  • Flash (all versions) : always UTF8 encoding. Constant strings in .hx files are encoded in UTF8 if they are are not UTF8-valid. This works well for ISO strings for instance.
  • Javascript : the String encoding depends on the HTML page or might be forced by the Browser itself. Constant strings in .hx files are left as-it. It seems that the browser makes appropriate conversion when using ISO but you would have better encode your .hx files with the same encoding you're using for your website.
  • Neko : strings are binary strings, they can contain any 8-bit chars. It means that if you have an UTF8 string, all String class operations will perform on the bytes and not on the UTF8 chars. You can use the neko.Utf8 API to manipulate UTF8 strings instead. Constant strings in .hx files are left as-is.
  • PHP : same as for neko except that you must use php.Utf8 instead of neko.Utf8 to manipulate UTF8 strings.
  • C++ : The same as neko. String are arrays of bytes, and the programmer must remember if they are utf8 encoded or not, and can use cpp.Utf8 to convert. String constants entered in utf8 in source code will initially be utf8 encoded (ie, may have a byte size (String.length)) larger than the number of characters in the string. The String member functions sizes etc. refer to byte positions, not character positions.

You can use the crossplatform haxe.io.Bytes.ofString to read the actual bytes of the String, whatever its encoding.

The \0 char

This detail the possibility of using the char code 0 into strings depending on the platform. In any case, it is better to use haxe.io.Bytes API when manipulating binary data, this will ensure better cross-platform behavior.

  • Flash : strings can't contain the 0 char
  • Neko : strings are binary, they can contain \0 chars and can still be appended together. Printing however will stop at the first \0 char, but other operations (substr, length, etc.) will perform well.
  • Javascript : this actually depends on the browser. Internet Explorer does not allow 0 char while other browsers do.
  • PHP : strings are binary and they can contain \0 chars. All operations on such strings should work as expected.
  • C++ : May contain the null character. The String.length is 'king'. However, certain external functions (eg filenames, printing etc) wont be able to see past the null character.

version #15769, modified 2012-12-06 17:09:33 by ppelleti