 |
BorlandTalk.com Borland discussion newsgroups
|
| View previous topic :: View next topic |
| Author |
Message |
adem Guest
|
Posted: Sun Jul 24, 2005 3:41 am Post subject: HTML Parser |
|
|
I am looking for a HTML parser that will turn a HTML page
into proper objects
i.e. tables that I can access by row and column, text
that I can get font/charset information etc.
I am not really interested in images/pictures; just the
text stuff.
And, it needs to be able to handle Unicode too.
Does anyone know of such a component --freeware with
source would of course be preferred as I would be
willing to do my own code maintenance and contribute
back, but --failing that-- commercial solutions are
OK too.
TVMIA
Cheers,
Adem
|
|
| Back to top |
|
 |
Robert Baker Guest
|
Posted: Sun Jul 24, 2005 6:42 am Post subject: Re: HTML Parser |
|
|
Adem,
I recently found "Extended IEParser V2" at
http://groups.yahoo.com/group/delphi-webbrowser/files/. To get the rows and
column data I needed, I had to write a few lines of code and the HTML was
not in "objects", but you could certainly add that feature easily enough.
BTW, the price [free] was right for my needs.
For a modest fee DIHtmlParser [at
http://www.zeitungsjunge.de/delphi/htmlparser/] looked like it might be more
advanced. Its price seems reasonable but I got what I needed from the free
one so I did not evaluate if thoroughly and cannot address its "object"
nature.
Best of luck,
Robert Baker
"adem" <adembaba (AT) excite (DOT) com> wrote
| Quote: | I am looking for a HTML parser that will turn a HTML page
into proper objects
|
|
|
| Back to top |
|
 |
listmember Guest
|
Posted: Sun Jul 24, 2005 12:30 pm Post subject: Re: HTML Parser |
|
|
Robert Baker wrote:
Hi Robert,
I did try to get that file but it seem it has been removed from there.
I did join the group furst :-)
| Quote: | but you could certainly add that feature easily enough.
|
Can you or somebody else help me with this bit. It seems it is
probbaly easier than I think; but I don't know how to do it.
| Quote: | BTW, the price [free] was right for my needs.
For a modest fee DIHtmlParser [at
http://www.zeitungsjunge.de/delphi/htmlparser/] looked like it might
be more advanced. Its price seems reasonable but I got what I needed
from the free one so I did not evaluate if thoroughly and cannot
address its "object" nature.
|
I am aware of DIHtmlParser, but it only parses HTML pages. There are
others that handle the parsing well enough for me.
Thanks :-)
Actually, what i am looking for is something like Ben Ziegler's
components --anyone remember WABD?
http://www.radix.net/~bziegler/Delphi/Index.html
Now, kbmWABD
http://www.components4programmers.com/products/kbmwabd/
but without the visual aspects.
Cheers,
Adem
|
|
| Back to top |
|
 |
John McTaggart Guest
|
Posted: Sun Jul 24, 2005 6:04 pm Post subject: Re: HTML Parser |
|
|
| Quote: | I am looking for a HTML parser that will turn a HTML page
into proper objects
i.e. tables that I can access by row and column, text
that I can get font/charset information etc.
|
I wrote one called ATagParser that handles pretty much
anything based on a tag format. It can easily handle table
rows and columns, but is limited to ANSI and UTF-8. Actually,
it can handle UTF-16LE and BE as well, but converts them
back to their UTF-8 equivalent..
http://www.compnet101.com/atagparser
The latest version is not online, but if you send me a
message with the version of Delphi that you're using
I'll send you the latest package..
BTW, I only go to D7. I know the source compiles under
all versions including D8, 2005 and even BCB6, but I
don't own any of them..sorry.
John McTaggart
|
|
| Back to top |
|
 |
listmember Guest
|
Posted: Sun Jul 24, 2005 7:32 pm Post subject: Re: HTML Parser |
|
|
John McTaggart wrote:
John,
I took a look at your site. From the property editor
screenshot, I can see that it is event-based. While
this is very good for speed as a general tag-based
document parser, it leaves me with the job of constructing
and filling all the data for the object tree I need to have.
What i am lookking for is one that turns all that information
into objects.
Cheers,
Adem
|
|
| Back to top |
|
 |
Robert Baker Guest
|
Posted: Sun Jul 24, 2005 9:55 pm Post subject: Re: HTML Parser |
|
|
Adem,
RE: I did try to get that file but it seem it has been removed from there. -
I did join the group first :-)
Hum - no idea of why you had this problem - I tried again just now and the
file downloaded perfectly. BTW, I had to muck around with a bit as it was
not updated for D7. In any case, it sounds like you will not be happy with
it as the parsing does not place the results into "objects". Please let me
know if you find such a solution.
Good luck,
Robert
|
|
| Back to top |
|
 |
John McTaggart Guest
|
Posted: Sun Jul 24, 2005 11:20 pm Post subject: Re: HTML Parser |
|
|
| Quote: | I took a look at your site. From the property editor
screenshot, I can see that it is event-based. While
this is very good for speed as a general tag-based
document parser, it leaves me with the job of constructing
and filling all the data for the object tree I need to have.
What i am lookking for is one that turns all that information
into objects.
|
Understandable.
I wrote it as the base that could be used to construct
higher level objects like trees..
I use it to parse several RSS feeds into a tree and it's
pretty simple. Each of the tags, text, data etc. is
returned in its corresponding event as a persistent object
with an index number.
Having that, it's pretty simple..
Speed wise, it's way snappy..
John McTaggart
|
|
| Back to top |
|
 |
Eddie Shipman Guest
|
Posted: Tue Aug 09, 2005 1:51 pm Post subject: Re: HTML Parser |
|
|
In article <xn0e54chl9zopsq01y (AT) forums (DOT) borland.com>, [email]adembaba (AT) excite (DOT) com[/email]
says...
| Quote: | I am looking for a HTML parser that will turn a HTML page
into proper objects
i.e. tables that I can access by row and column, text
that I can get font/charset information etc.
I am not really interested in images/pictures; just the
text stuff.
And, it needs to be able to handle Unicode too.
Does anyone know of such a component --freeware with
source would of course be preferred as I would be
willing to do my own code maintenance and contribute
back, but --failing that-- commercial solutions are
OK too.
|
Sounds like you are talking about the MSHTML DOM.
All of this stuff you can get from the DOM just
by examining the correct property.
And it handles unicode if your browser version does.
|
|
| Back to top |
|
 |
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|