 |
BorlandTalk.com Borland discussion newsgroups
|
| View previous topic :: View next topic |
| Author |
Message |
Remy Lebeau (TeamB) Guest
|
Posted: Tue Sep 27, 2005 5:58 pm Post subject: Re: idHttp/Get ... Anyway to get as WideStr? |
|
|
"dk_sz" <dk_sz (AT) hotmail (DOT) com> wrote
| Quote: | So "FHTTP.Get(AURI);" returns a normal string... Should I
then parse it to see what character set it uses? If it's e.g. UTF-8...
|
You should look at the downloaded headers, not the actual string, to
determine that.
However, if the server returns Unicode content, obviously that won't fit
into a normal Ansi string in the first place. It would be better to
download the content into an intermediary TStream first, and then you can
extract the data from that as appropriate.
Gambit
|
|
| Back to top |
|
 |
Remy Lebeau (TeamB) Guest
|
Posted: Tue Sep 27, 2005 10:02 pm Post subject: Re: idHttp/Get ... Anyway to get as WideStr? |
|
|
"dk_sz" <dk_sz (AT) hotmail (DOT) com> wrote
| Quote: | Does the webserver check the output/document
(static or not) in question to determine response?
|
It can. The Content-Type header usually includes the charset of the data.
| Quote: | Well... UTF-8 should be fine (AFAIK)...
|
You don't know ahead of time what the type of data will be, unless you
request just the headers before then requesting the data. Best to just
download to a type-independant stream and then process the stream as needed.
| Quote: | (characters < 128 just takes one byte in UTF-8, which is why
I believe I probably can parse the beginning HTML directly.
At least that is what I've read a couple of places.)
|
You are assuming that the data is sent as 1-byte Ansi or as UTF-8 in the
first place. Also keep in mind that there are other encoding schemes
available that may be used.
Gambit
|
|
| Back to top |
|
 |
dk_sz Guest
|
Posted: Wed Sep 28, 2005 2:15 pm Post subject: Re: idHttp/Get ... Anyway to get as WideStr? |
|
|
| Quote: | You don't know ahead of time what the type of data will be, unless you
request just the headers before then requesting the data. Best to just
download to a type-independant stream and then process the stream as
needed.
|
OK, but e.g. www.google.com and www.webmasterworld.com doesn't return
Reponse-header with such info. (I used firefox WebDeveloper addin to check
this.)
Response header just returns: Content-Type: text/html (!)
As fallback I will assume UTF-8 / iso-8859-1 (or compatible) and scan for
meta tag.
If that fails as well I intend to assume iso-8859-1. I think that's a farly
good tradeoff?
thanks for the help
best regards
Thomas Schulz
|
|
| Back to top |
|
 |
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|