 |
BorlandTalk.com Borland discussion newsgroups
|
| View previous topic :: View next topic |
| Author |
Message |
Dmitri Oulitski Guest
|
Posted: Wed Jul 09, 2003 3:37 pm Post subject: UTF8Encode, UTF8Decode and surrogates |
|
|
Does utf8encode and utf8decode support surrogates?
If not, how can I add support for surrogates?
Thank you,
Dmitri Oulitski
|
|
| Back to top |
|
 |
FL Guest
|
Posted: Wed Jul 09, 2003 4:22 pm Post subject: Re: UTF8Encode, UTF8Decode and surrogates |
|
|
UTF-8 is just an encoding method and inherently supports surrogates when
your app supports them.
Francisco
Dmitri Oulitski wrote:
| Quote: |
Does utf8encode and utf8decode support surrogates?
|
|
|
| Back to top |
|
 |
Danny Heijl Guest
|
Posted: Wed Jul 09, 2003 7:34 pm Post subject: Re: UTF8Encode, UTF8Decode and surrogates |
|
|
On Windows you could try WideCharToMultiByte with a codepage of CP_UTF8.
Danny
---
"Dmitri Oulitski" <dmitri.ulitski (AT) nimbuspartners (DOT) com> schreef in bericht
news:3f0c46a9$1 (AT) newsgroups (DOT) borland.com...
| Quote: | According to Unicode specification character U+10302
will be encoded as D800 DF02 in UTF-16 and should be encoded as F0 90 8C
82
in UTF-8
But delphi encodes this character as ED A0 80 ED BC 82 in UTF-8,
i.e. delphi treats each code unit in UTF-16 separately
though code unit sequence is surrogate pair and should be treated as one
character
I think this is a bug and delphi should support surrogate pairs.
Regards,
Dmitri Oulitski
|
|
|
| Back to top |
|
 |
Franz-Leo Chomse Guest
|
Posted: Wed Jul 09, 2003 7:56 pm Post subject: Re: UTF8Encode, UTF8Decode and surrogates |
|
|
| Quote: | I think this is a bug and delphi should support surrogate pairs.
|
The routines are older than the definition of valid surrogate pairs.
|
|
| Back to top |
|
 |
Franz-Leo Chomse Guest
|
Posted: Wed Jul 09, 2003 8:22 pm Post subject: Re: UTF8Encode, UTF8Decode and surrogates |
|
|
On Wed, 9 Jul 2003 17:45:26 +0100, "Dmitri Oulitski"
<dmitri.ulitski (AT) nimbuspartners (DOT) com> wrote:
| Quote: | According to Unicode specification character U+10302
will be encoded as D800 DF02 in UTF-16 and should be encoded as F0 90 8C 82
in UTF-8
|
Which version of Unicode?. There is also the statement.
| Quote: | Also note that the code positions U+D800 to U+DFFF (UTF-16 surrogates) as well as U+FFFE and U+FFFF
must not occur in normal UTF-8 or UCS-4 data. UTF-8 decoders should treat them like malformed or
overlong sequences for safety reasons.
|
Regards from Germany
Franz-Leo
|
|
| Back to top |
|
 |
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|