 |
BorlandTalk.com Borland discussion newsgroups
|
| View previous topic :: View next topic |
| Author |
Message |
Robert Oschler Guest
|
Posted: Sun Nov 23, 2003 2:01 am Post subject: Parse HTML documents without using TWebBrowser? |
|
|
Is there a way to parse an HTML file, possibly from a string, without having
to get involved with TWebBrowser?
I'd like to use MSHTML directly to parse a string/file based HTML document
and have it return an instance of IHTMLDocument2. Is there a way to do
this?
thx
--
Robert Oschler
http://www.dog-images.com -- Devoted to providing free info on the health,
nutrition, and training of dogs.
|
|
| Back to top |
|
 |
Remy Lebeau (TeamB) Guest
|
Posted: Sun Nov 23, 2003 5:16 am Post subject: Re: Parse HTML documents without using TWebBrowser? |
|
|
"Robert Oschler" <no_replies (AT) fake_email_address (DOT) invalid> wrote
| Quote: | Is there a way to parse an HTML file, possibly from a string,
without having to get involved with TWebBrowser?
|
Of course. Just use a third-party HTML parser, or parse the HTML manually.
The real question is, though - what exactly do you want to do with the HTML?
Are you just looking for a way to get data out of the HTML, or do you want
to display it, or what? Please be more specific.
| Quote: | I'd like to use MSHTML directly to parse a string/file based
HTML document and have it return an instance of IHTMLDocument2.
Is there a way to do this?
|
You cannot instantiate IHTMLDocument2 directly, you can only obtain an
instance of it from an existing browser instance. MSHTML is a library of
browser controls, not standalone parsers.
Gambit
|
|
| Back to top |
|
 |
Robert Oschler Guest
|
Posted: Sun Nov 23, 2003 5:46 pm Post subject: Re: Parse HTML documents without using TWebBrowser? |
|
|
"Remy Lebeau (TeamB)" <gambit47.no.spam (AT) no (DOT) spam.yahoo.com> wrote
| Quote: |
"Robert Oschler" <no_replies (AT) fake_email_address (DOT) invalid> wrote in message
news:3fc0148d$1 (AT) newsgroups (DOT) borland.com...
Is there a way to parse an HTML file, possibly from a string,
without having to get involved with TWebBrowser?
Of course. Just use a third-party HTML parser, or parse the HTML
manually.
The real question is, though - what exactly do you want to do with the
HTML?
Are you just looking for a way to get data out of the HTML, or do you want
to display it, or what? Please be more specific.
I'd like to use MSHTML directly to parse a string/file based
HTML document and have it return an instance of IHTMLDocument2.
Is there a way to do this?
You cannot instantiate IHTMLDocument2 directly, you can only obtain an
instance of it from an existing browser instance. MSHTML is a library of
browser controls, not standalone parsers.
Gambit
|
Remy,
I wanted to be able to take advantage of the MSHTML interfaces
(IHTMLDocument2, IHTMLElement, etc.) by having HTML code parsed into an
IHTMLDocument2 instance. But I take it from you message that doing it
without using a browser instance is rough road at best.
thx
--
Robert Oschler
http://www.dog-images.com -- Devoted to providing free info on the health,
nutrition, and training of dogs.
|
|
| Back to top |
|
 |
Eddie Shipman Guest
|
Posted: Sun Nov 23, 2003 7:33 pm Post subject: Re: Parse HTML documents without using TWebBrowser? |
|
|
In article <3fc0148d$1 (AT) newsgroups (DOT) borland.com>,
[email]no_replies (AT) fake_email_address (DOT) inva[/email]lid says...
| Quote: | Is there a way to parse an HTML file, possibly from a string, without having
to get involved with TWebBrowser?
I'd like to use MSHTML directly to parse a string/file based HTML document
and have it return an instance of IHTMLDocument2. Is there a way to do
this?
|
I'ts not that difficult. You can either use the IEParser from
www.ultimind.com/iedelphi or the extIEParser that is available from the
delphi-webbrowser yahoo group, (Must be a member to download)
http://groups/yahoo.com/group/delphi-webbrowser
Or you can take the HTML string and just slap it into an IHTMLDocument2
using the tip on Henri Fournier's WebBrowser faqs
http://members.shaw.ca/iedelphi/webbrowser.htm#advanced4
Of course you'd put it into your own IHTMLDocument2 object instead of
the webbrowsers.
|
|
| Back to top |
|
 |
Remy Lebeau (TeamB) Guest
|
Posted: Sun Nov 23, 2003 11:33 pm Post subject: Re: Parse HTML documents without using TWebBrowser? |
|
|
"Robert Oschler" <no_replies (AT) fake_email_address (DOT) invalid> wrote
| Quote: | I wanted to be able to take advantage of the MSHTML
interfaces (IHTMLDocument2, IHTMLElement, etc.) by
having HTML code parsed into an IHTMLDocument2 instance.
But I take it from you message that doing it without using a
browser instance is rough road at best.
|
It is not just rough - it is impossible. MSHTML was not set up for what you
ask, it has to be used in the context of an actual browser only.
Gambit
|
|
| Back to top |
|
 |
Remy Lebeau (TeamB) Guest
|
Posted: Sun Nov 23, 2003 11:42 pm Post subject: Re: Parse HTML documents without using TWebBrowser? |
|
|
"Eddie Shipman" <eshipman@yahoo!!!.com> wrote
That code is still using the browser's IHTMLDocument2, not a standalone
instance.
Alternatively, to load a document with content, you can use the
IPersistStreamInit interface, ie:
var
HTMLDocument: IHTMLDocument2;
PersistStream: IPersistStreamInit;
FStream: TStringStream;
begin
HTMLDocument := WebBrowser1.Document as IHTMLDocument2;
PersistStream := HTMLDocument as IPersistStreamInit;
FStream := TStringStream.Create(HTMLString);
try
PersistStream.Load(TStreamAdapter.Create(FStream, soReference));
...
finally
FStream.Free;
end;
end;
| Quote: | Of course you'd put it into your own IHTMLDocument2 object
instead of the webbrowsers.
|
You cannot instantiate IHTMLDocument2 on its own.
Gambit
|
|
| Back to top |
|
 |
eshipman Guest
|
Posted: Mon Nov 24, 2003 2:07 pm Post subject: Re: Parse HTML documents without using TWebBrowser? |
|
|
In article <3fc14504$1 (AT) newsgroups (DOT) borland.com>, "Remy Lebeau (TeamB)"
<gambit47.no.spam (AT) no (DOT) spam.yahoo.com> says...
| Quote: |
"Eddie Shipman" <eshipman@yahoo!!!.com> wrote in message
news:MPG.1a2ac77787e344e7989690 (AT) forums (DOT) borland.com...
Or you can take the HTML string and just slap it into an
IHTMLDocument2 using the tip on Henri Fournier's WebBrowser faqs
http://members.shaw.ca/iedelphi/webbrowser.htm#advanced4
That code is still using the browser's IHTMLDocument2, not a standalone
instance.
Did I not state that: |
"Of course you'd put it into your own IHTMLDocument2 object instead of
the webbrowsers"
|
|
| Back to top |
|
 |
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|