BorlandTalk.com Forum Index BorlandTalk.com
Borland discussion newsgroups
 
Archives   FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

Parse HTML documents without using TWebBrowser?

 
Post new topic   Reply to topic    BorlandTalk.com Forum Index -> Delphi Internet Winsock
View previous topic :: View next topic  
Author Message
Robert Oschler
Guest





PostPosted: Sun Nov 23, 2003 2:01 am    Post subject: Parse HTML documents without using TWebBrowser? Reply with quote



Is there a way to parse an HTML file, possibly from a string, without having
to get involved with TWebBrowser?

I'd like to use MSHTML directly to parse a string/file based HTML document
and have it return an instance of IHTMLDocument2. Is there a way to do
this?

thx

--
Robert Oschler
http://www.dog-images.com -- Devoted to providing free info on the health,
nutrition, and training of dogs.



Back to top
Remy Lebeau (TeamB)
Guest





PostPosted: Sun Nov 23, 2003 5:16 am    Post subject: Re: Parse HTML documents without using TWebBrowser? Reply with quote




"Robert Oschler" <no_replies (AT) fake_email_address (DOT) invalid> wrote

Quote:
Is there a way to parse an HTML file, possibly from a string,
without having to get involved with TWebBrowser?

Of course. Just use a third-party HTML parser, or parse the HTML manually.

The real question is, though - what exactly do you want to do with the HTML?
Are you just looking for a way to get data out of the HTML, or do you want
to display it, or what? Please be more specific.

Quote:
I'd like to use MSHTML directly to parse a string/file based
HTML document and have it return an instance of IHTMLDocument2.
Is there a way to do this?

You cannot instantiate IHTMLDocument2 directly, you can only obtain an
instance of it from an existing browser instance. MSHTML is a library of
browser controls, not standalone parsers.


Gambit



Back to top
Robert Oschler
Guest





PostPosted: Sun Nov 23, 2003 5:46 pm    Post subject: Re: Parse HTML documents without using TWebBrowser? Reply with quote



"Remy Lebeau (TeamB)" <gambit47.no.spam (AT) no (DOT) spam.yahoo.com> wrote

Quote:

"Robert Oschler" <no_replies (AT) fake_email_address (DOT) invalid> wrote in message
news:3fc0148d$1 (AT) newsgroups (DOT) borland.com...
Is there a way to parse an HTML file, possibly from a string,
without having to get involved with TWebBrowser?

Of course. Just use a third-party HTML parser, or parse the HTML
manually.

The real question is, though - what exactly do you want to do with the
HTML?
Are you just looking for a way to get data out of the HTML, or do you want
to display it, or what? Please be more specific.

I'd like to use MSHTML directly to parse a string/file based
HTML document and have it return an instance of IHTMLDocument2.
Is there a way to do this?

You cannot instantiate IHTMLDocument2 directly, you can only obtain an
instance of it from an existing browser instance. MSHTML is a library of
browser controls, not standalone parsers.


Gambit



Remy,

I wanted to be able to take advantage of the MSHTML interfaces
(IHTMLDocument2, IHTMLElement, etc.) by having HTML code parsed into an
IHTMLDocument2 instance. But I take it from you message that doing it
without using a browser instance is rough road at best.

thx

--
Robert Oschler
http://www.dog-images.com -- Devoted to providing free info on the health,
nutrition, and training of dogs.





Back to top
Eddie Shipman
Guest





PostPosted: Sun Nov 23, 2003 7:33 pm    Post subject: Re: Parse HTML documents without using TWebBrowser? Reply with quote

In article <3fc0148d$1 (AT) newsgroups (DOT) borland.com>,
[email]no_replies (AT) fake_email_address (DOT) inva[/email]lid says...
Quote:
Is there a way to parse an HTML file, possibly from a string, without having
to get involved with TWebBrowser?

I'd like to use MSHTML directly to parse a string/file based HTML document
and have it return an instance of IHTMLDocument2. Is there a way to do
this?


I'ts not that difficult. You can either use the IEParser from
www.ultimind.com/iedelphi or the extIEParser that is available from the
delphi-webbrowser yahoo group, (Must be a member to download)
http://groups/yahoo.com/group/delphi-webbrowser

Or you can take the HTML string and just slap it into an IHTMLDocument2
using the tip on Henri Fournier's WebBrowser faqs
http://members.shaw.ca/iedelphi/webbrowser.htm#advanced4
Of course you'd put it into your own IHTMLDocument2 object instead of
the webbrowsers.

Back to top
Remy Lebeau (TeamB)
Guest





PostPosted: Sun Nov 23, 2003 11:33 pm    Post subject: Re: Parse HTML documents without using TWebBrowser? Reply with quote


"Robert Oschler" <no_replies (AT) fake_email_address (DOT) invalid> wrote


Quote:
I wanted to be able to take advantage of the MSHTML
interfaces (IHTMLDocument2, IHTMLElement, etc.) by
having HTML code parsed into an IHTMLDocument2 instance.
But I take it from you message that doing it without using a
browser instance is rough road at best.

It is not just rough - it is impossible. MSHTML was not set up for what you
ask, it has to be used in the context of an actual browser only.


Gambit



Back to top
Remy Lebeau (TeamB)
Guest





PostPosted: Sun Nov 23, 2003 11:42 pm    Post subject: Re: Parse HTML documents without using TWebBrowser? Reply with quote

"Eddie Shipman" <eshipman@yahoo!!!.com> wrote


Quote:
Or you can take the HTML string and just slap it into an
IHTMLDocument2 using the tip on Henri Fournier's WebBrowser faqs
http://members.shaw.ca/iedelphi/webbrowser.htm#advanced4

That code is still using the browser's IHTMLDocument2, not a standalone
instance.

Alternatively, to load a document with content, you can use the
IPersistStreamInit interface, ie:

var
HTMLDocument: IHTMLDocument2;
PersistStream: IPersistStreamInit;
FStream: TStringStream;
begin
HTMLDocument := WebBrowser1.Document as IHTMLDocument2;
PersistStream := HTMLDocument as IPersistStreamInit;
FStream := TStringStream.Create(HTMLString);
try
PersistStream.Load(TStreamAdapter.Create(FStream, soReference));
...
finally
FStream.Free;
end;
end;

Quote:
Of course you'd put it into your own IHTMLDocument2 object
instead of the webbrowsers.

You cannot instantiate IHTMLDocument2 on its own.


Gambit



Back to top
eshipman
Guest





PostPosted: Mon Nov 24, 2003 2:07 pm    Post subject: Re: Parse HTML documents without using TWebBrowser? Reply with quote

In article <3fc14504$1 (AT) newsgroups (DOT) borland.com>, "Remy Lebeau (TeamB)"
<gambit47.no.spam (AT) no (DOT) spam.yahoo.com> says...
Quote:

"Eddie Shipman" <eshipman@yahoo!!!.com> wrote in message
news:MPG.1a2ac77787e344e7989690 (AT) forums (DOT) borland.com...

Or you can take the HTML string and just slap it into an
IHTMLDocument2 using the tip on Henri Fournier's WebBrowser faqs
http://members.shaw.ca/iedelphi/webbrowser.htm#advanced4

That code is still using the browser's IHTMLDocument2, not a standalone
instance.

Did I not state that:


"Of course you'd put it into your own IHTMLDocument2 object instead of
the webbrowsers"



Back to top
Display posts from previous:   
Post new topic   Reply to topic    BorlandTalk.com Forum Index -> Delphi Internet Winsock All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2006 phpBB Group
SEO toolkit © 2004-2006 webmedic.