BorlandTalk.com Forum Index BorlandTalk.com
Borland discussion newsgroups
 
Archives   FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

Re: idHttp/Get ... Anyway to get as WideStr?

 
Post new topic   Reply to topic    BorlandTalk.com Forum Index -> Delphi Internet Winsock
View previous topic :: View next topic  
Author Message
Remy Lebeau (TeamB)
Guest





PostPosted: Tue Sep 27, 2005 5:58 pm    Post subject: Re: idHttp/Get ... Anyway to get as WideStr? Reply with quote




"dk_sz" <dk_sz (AT) hotmail (DOT) com> wrote


Quote:
So "FHTTP.Get(AURI);" returns a normal string... Should I
then parse it to see what character set it uses? If it's e.g. UTF-8...

You should look at the downloaded headers, not the actual string, to
determine that.

However, if the server returns Unicode content, obviously that won't fit
into a normal Ansi string in the first place. It would be better to
download the content into an intermediary TStream first, and then you can
extract the data from that as appropriate.


Gambit



Back to top
Remy Lebeau (TeamB)
Guest





PostPosted: Tue Sep 27, 2005 10:02 pm    Post subject: Re: idHttp/Get ... Anyway to get as WideStr? Reply with quote




"dk_sz" <dk_sz (AT) hotmail (DOT) com> wrote


Quote:
Does the webserver check the output/document
(static or not) in question to determine response?

It can. The Content-Type header usually includes the charset of the data.

Quote:
Well... UTF-8 should be fine (AFAIK)...

You don't know ahead of time what the type of data will be, unless you
request just the headers before then requesting the data. Best to just
download to a type-independant stream and then process the stream as needed.

Quote:
(characters < 128 just takes one byte in UTF-8, which is why
I believe I probably can parse the beginning HTML directly.
At least that is what I've read a couple of places.)

You are assuming that the data is sent as 1-byte Ansi or as UTF-8 in the
first place. Also keep in mind that there are other encoding schemes
available that may be used.


Gambit



Back to top
dk_sz
Guest





PostPosted: Wed Sep 28, 2005 2:15 pm    Post subject: Re: idHttp/Get ... Anyway to get as WideStr? Reply with quote



Quote:
You don't know ahead of time what the type of data will be, unless you
request just the headers before then requesting the data. Best to just
download to a type-independant stream and then process the stream as
needed.

OK, but e.g. www.google.com and www.webmasterworld.com doesn't return
Reponse-header with such info. (I used firefox WebDeveloper addin to check
this.)
Response header just returns: Content-Type: text/html (!)
As fallback I will assume UTF-8 / iso-8859-1 (or compatible) and scan for
meta tag.
If that fails as well I intend to assume iso-8859-1. I think that's a farly
good tradeoff?

thanks for the help
best regards
Thomas Schulz



Back to top
Display posts from previous:   
Post new topic   Reply to topic    BorlandTalk.com Forum Index -> Delphi Internet Winsock All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2006 phpBB Group
SEO toolkit © 2004-2006 webmedic.