BorlandTalk.com Forum Index BorlandTalk.com
Borland discussion newsgroups
 
Archives   FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

UTF8Encode, UTF8Decode and surrogates

 
Post new topic   Reply to topic    BorlandTalk.com Forum Index -> Delphi Internationalization
View previous topic :: View next topic  
Author Message
Dmitri Oulitski
Guest





PostPosted: Wed Jul 09, 2003 3:37 pm    Post subject: UTF8Encode, UTF8Decode and surrogates Reply with quote



Does utf8encode and utf8decode support surrogates?

If not, how can I add support for surrogates?

Thank you,

Dmitri Oulitski


Back to top
FL
Guest





PostPosted: Wed Jul 09, 2003 4:22 pm    Post subject: Re: UTF8Encode, UTF8Decode and surrogates Reply with quote



UTF-8 is just an encoding method and inherently supports surrogates when
your app supports them.

Francisco

Dmitri Oulitski wrote:
Quote:

Does utf8encode and utf8decode support surrogates?


Back to top
Danny Heijl
Guest





PostPosted: Wed Jul 09, 2003 7:34 pm    Post subject: Re: UTF8Encode, UTF8Decode and surrogates Reply with quote



On Windows you could try WideCharToMultiByte with a codepage of CP_UTF8.

Danny
---

"Dmitri Oulitski" <dmitri.ulitski (AT) nimbuspartners (DOT) com> schreef in bericht
news:3f0c46a9$1 (AT) newsgroups (DOT) borland.com...
Quote:
According to Unicode specification character U+10302
will be encoded as D800 DF02 in UTF-16 and should be encoded as F0 90 8C
82
in UTF-8

But delphi encodes this character as ED A0 80 ED BC 82 in UTF-8,
i.e. delphi treats each code unit in UTF-16 separately
though code unit sequence is surrogate pair and should be treated as one
character

I think this is a bug and delphi should support surrogate pairs.

Regards,

Dmitri Oulitski





Back to top
Franz-Leo Chomse
Guest





PostPosted: Wed Jul 09, 2003 7:56 pm    Post subject: Re: UTF8Encode, UTF8Decode and surrogates Reply with quote


Quote:
I think this is a bug and delphi should support surrogate pairs.

The routines are older than the definition of valid surrogate pairs.



Back to top
Franz-Leo Chomse
Guest





PostPosted: Wed Jul 09, 2003 8:22 pm    Post subject: Re: UTF8Encode, UTF8Decode and surrogates Reply with quote

On Wed, 9 Jul 2003 17:45:26 +0100, "Dmitri Oulitski"
<dmitri.ulitski (AT) nimbuspartners (DOT) com> wrote:

Quote:
According to Unicode specification character U+10302
will be encoded as D800 DF02 in UTF-16 and should be encoded as F0 90 8C 82
in UTF-8

Which version of Unicode?. There is also the statement.

Quote:
Also note that the code positions U+D800 to U+DFFF (UTF-16 surrogates) as well as U+FFFE and U+FFFF
must not occur in normal UTF-8 or UCS-4 data. UTF-8 decoders should treat them like malformed or
overlong sequences for safety reasons.

Regards from Germany

Franz-Leo







Back to top
Display posts from previous:   
Post new topic   Reply to topic    BorlandTalk.com Forum Index -> Delphi Internationalization All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2006 phpBB Group
SEO toolkit © 2004-2006 webmedic.