BorlandTalk.com Forum Index BorlandTalk.com
Borland discussion newsgroups
 
Archives   FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

HTML Parser

 
Post new topic   Reply to topic    BorlandTalk.com Forum Index -> Delphi Thirdparty Tools (General)
View previous topic :: View next topic  
Author Message
adem
Guest





PostPosted: Sun Jul 24, 2005 3:41 am    Post subject: HTML Parser Reply with quote



I am looking for a HTML parser that will turn a HTML page
into proper objects

i.e. tables that I can access by row and column, text
that I can get font/charset information etc.

I am not really interested in images/pictures; just the
text stuff.

And, it needs to be able to handle Unicode too.

Does anyone know of such a component --freeware with
source would of course be preferred as I would be
willing to do my own code maintenance and contribute
back, but --failing that-- commercial solutions are
OK too.

TVMIA

Cheers,
Adem
Back to top
Robert Baker
Guest





PostPosted: Sun Jul 24, 2005 6:42 am    Post subject: Re: HTML Parser Reply with quote



Adem,

I recently found "Extended IEParser V2" at
http://groups.yahoo.com/group/delphi-webbrowser/files/. To get the rows and
column data I needed, I had to write a few lines of code and the HTML was
not in "objects", but you could certainly add that feature easily enough.
BTW, the price [free] was right for my needs.

For a modest fee DIHtmlParser [at
http://www.zeitungsjunge.de/delphi/htmlparser/] looked like it might be more
advanced. Its price seems reasonable but I got what I needed from the free
one so I did not evaluate if thoroughly and cannot address its "object"
nature.

Best of luck,

Robert Baker



"adem" <adembaba (AT) excite (DOT) com> wrote

Quote:
I am looking for a HTML parser that will turn a HTML page
into proper objects



Back to top
listmember
Guest





PostPosted: Sun Jul 24, 2005 12:30 pm    Post subject: Re: HTML Parser Reply with quote



Robert Baker wrote:

Hi Robert,

Quote:
I recently found "Extended IEParser V2" at
http://groups.yahoo.com/group/delphi-webbrowser/files/. To get the
rows and column data I needed, I had to write a few lines of code and
the HTML was not in "objects",

I did try to get that file but it seem it has been removed from there.
I did join the group furst :-)

Quote:
but you could certainly add that feature easily enough.

Can you or somebody else help me with this bit. It seems it is
probbaly easier than I think; but I don't know how to do it.

Quote:
BTW, the price [free] was right for my needs.

For a modest fee DIHtmlParser [at
http://www.zeitungsjunge.de/delphi/htmlparser/] looked like it might
be more advanced. Its price seems reasonable but I got what I needed
from the free one so I did not evaluate if thoroughly and cannot
address its "object" nature.

I am aware of DIHtmlParser, but it only parses HTML pages. There are
others that handle the parsing well enough for me.

Quote:
Best of luck,

Thanks :-)

Actually, what i am looking for is something like Ben Ziegler's
components --anyone remember WABD?

http://www.radix.net/~bziegler/Delphi/Index.html

Now, kbmWABD

http://www.components4programmers.com/products/kbmwabd/

but without the visual aspects.

Cheers,
Adem

Back to top
John McTaggart
Guest





PostPosted: Sun Jul 24, 2005 6:04 pm    Post subject: Re: HTML Parser Reply with quote

Quote:
I am looking for a HTML parser that will turn a HTML page
into proper objects

i.e. tables that I can access by row and column, text
that I can get font/charset information etc.

I wrote one called ATagParser that handles pretty much
anything based on a tag format. It can easily handle table
rows and columns, but is limited to ANSI and UTF-8. Actually,
it can handle UTF-16LE and BE as well, but converts them
back to their UTF-8 equivalent..

http://www.compnet101.com/atagparser

The latest version is not online, but if you send me a
message with the version of Delphi that you're using
I'll send you the latest package..

BTW, I only go to D7. I know the source compiles under
all versions including D8, 2005 and even BCB6, but I
don't own any of them..sorry.

John McTaggart



Back to top
listmember
Guest





PostPosted: Sun Jul 24, 2005 7:32 pm    Post subject: Re: HTML Parser Reply with quote

John McTaggart wrote:

John,

Quote:
http://www.compnet101.com/atagparser

I took a look at your site. From the property editor
screenshot, I can see that it is event-based. While
this is very good for speed as a general tag-based
document parser, it leaves me with the job of constructing
and filling all the data for the object tree I need to have.

What i am lookking for is one that turns all that information
into objects.

Cheers,
Adem


Back to top
Robert Baker
Guest





PostPosted: Sun Jul 24, 2005 9:55 pm    Post subject: Re: HTML Parser Reply with quote

Adem,

RE: I did try to get that file but it seem it has been removed from there. -
I did join the group first :-)

Hum - no idea of why you had this problem - I tried again just now and the
file downloaded perfectly. BTW, I had to muck around with a bit as it was
not updated for D7. In any case, it sounds like you will not be happy with
it as the parsing does not place the results into "objects". Please let me
know if you find such a solution.

Good luck,

Robert


Back to top
John McTaggart
Guest





PostPosted: Sun Jul 24, 2005 11:20 pm    Post subject: Re: HTML Parser Reply with quote

Quote:
I took a look at your site. From the property editor
screenshot, I can see that it is event-based. While
this is very good for speed as a general tag-based
document parser, it leaves me with the job of constructing
and filling all the data for the object tree I need to have.

What i am lookking for is one that turns all that information
into objects.

Understandable.

I wrote it as the base that could be used to construct
higher level objects like trees..

I use it to parse several RSS feeds into a tree and it's
pretty simple. Each of the tags, text, data etc. is
returned in its corresponding event as a persistent object
with an index number.

Having that, it's pretty simple..

Speed wise, it's way snappy..

John McTaggart



Back to top
Eddie Shipman
Guest





PostPosted: Tue Aug 09, 2005 1:51 pm    Post subject: Re: HTML Parser Reply with quote

In article <xn0e54chl9zopsq01y (AT) forums (DOT) borland.com>, [email]adembaba (AT) excite (DOT) com[/email]
says...
Quote:
I am looking for a HTML parser that will turn a HTML page
into proper objects

i.e. tables that I can access by row and column, text
that I can get font/charset information etc.

I am not really interested in images/pictures; just the
text stuff.

And, it needs to be able to handle Unicode too.

Does anyone know of such a component --freeware with
source would of course be preferred as I would be
willing to do my own code maintenance and contribute
back, but --failing that-- commercial solutions are
OK too.

Sounds like you are talking about the MSHTML DOM.
All of this stuff you can get from the DOM just
by examining the correct property.

And it handles unicode if your browser version does.

Back to top
Display posts from previous:   
Post new topic   Reply to topic    BorlandTalk.com Forum Index -> Delphi Thirdparty Tools (General) All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2006 phpBB Group
SEO toolkit © 2004-2006 webmedic.