Hi
Trying to simplify as much as possible.(hope everything is correct)
The unicode strings in windows are coded as UTF-16LE ,
that means 2 bytes for codepoints <= &hFFFF (65535)
but, 4 bytes for codepoints >=&h10000(65536) , it is known as surrogate pairs
even these extended codepoints are not very frequent in normal usage , they can exist,
and in that case some functions playing with unicode
have to be adapted to accept that possibility ( if not the risk is to get bad char)
functions not working "correctly" according that extended codepoints
standard string functions ( probably not exhaustive ) are the following :
len ; mid ; left ; right ; asc ; wchr
but also more sophisticated functions like
reverse, parse , parsecount , split ...
in fact all kind of operation counting chars , position in string may be affected by that surrogate pair story
So, be sure when playing with unicode with normal functions ,
you are not using the extended unit codes (only using UCS-2)
marc
Hi,
To continue on the subject : utf16 and surrogate pairs
here is a link to get free unicode font able to display the extended unitcode : above the Basic multilingual Plane (surrogate pairs)
http://unifoundry.com/unifont.html (http://unifoundry.com/unifont.html)
you can use this one
Glyphs above the Unicode Basic Multilingual Plane: unifont_upper-9.0.01.ttf (1 Mbyte)
or
Glyphs above the Unicode Basic Multilingual Plane with CSUR PUA Glyphs: unifont_upper_csur-9.0.01.ttf (1 Mbyte)
on that web page, you will be also able to see the full unicode chars ,
GNU Unifont Glyphs Unicode Basic Multilingual Plane
or
GNU Unifont Glyphs Unicode Supplemental Multilingual Plane
Marc