BorlandTalk.com Forum Index BorlandTalk.com
Borland discussion newsgroups
 
Archives   FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

Word DOC to HTML VCL?

 
Post new topic   Reply to topic    BorlandTalk.com Forum Index -> Delphi Non-Technical
View previous topic :: View next topic  
Author Message
Chris Cheah
Guest





PostPosted: Thu Jul 22, 2004 11:15 pm    Post subject: Word DOC to HTML VCL? Reply with quote



Hello:

Can anyone recommend a good VCL for D7/8 that can read a Word Doc file and
convert to plain HTML, stripping the uncessary MS formatting tags?

Thanks

Chris


Back to top
John Kaster (Borland)
Guest





PostPosted: Fri Jul 23, 2004 12:36 am    Post subject: Re: Word DOC to HTML VCL? Reply with quote



Chris Cheah in <41004aae$1 (AT) newsgroups (DOT) borland.com> wrote:

Quote:
Can anyone recommend a good VCL for D7/8 that can read a Word Doc
file and convert to plain HTML, stripping the uncessary MS formatting
tags?

I would hazard a guess that anyone doing this would be automating word
for the initial export. Then writing an HTML parser to strip out the
unnecessary stuff would be the next step.

Here's one non-automated way to do this:
http://blogs.borland.com/johnk/archive/2004/06/04/10.aspx


--
John Kaster, Borland Developer Relations, http://bdn.borland.com
BorCon2004, all info in one place! http://info.borland.com/conf2004
Features and bugs: http://qc.borland.com
Get source: http://cc.borland.com
Unofficial information overload: http://blogs.borland.com

Back to top
Chris Cheah
Guest





PostPosted: Fri Jul 23, 2004 12:46 am    Post subject: Re: Word DOC to HTML VCL? Reply with quote



John:

Thanks for the pointer. Yes it would be uselful ... only if I can include
the units in D7 and recompiled into my code without any manual intervention;
something like reading a TStream and parsing the data.

Any hope of doing this in D7 now?

Thanks

Chris
"John Kaster (Borland)" <johnk (AT) borland (DOT) com> wrote

Quote:
Chris Cheah in <41004aae$1 (AT) newsgroups (DOT) borland.com> wrote:

Can anyone recommend a good VCL for D7/8 that can read a Word Doc
file and convert to plain HTML, stripping the uncessary MS formatting
tags?

I would hazard a guess that anyone doing this would be automating word
for the initial export. Then writing an HTML parser to strip out the
unnecessary stuff would be the next step.

Here's one non-automated way to do this:
http://blogs.borland.com/johnk/archive/2004/06/04/10.aspx


--
John Kaster, Borland Developer Relations, http://bdn.borland.com
BorCon2004, all info in one place! http://info.borland.com/conf2004
Features and bugs: http://qc.borland.com
Get source: http://cc.borland.com
Unofficial information overload: http://blogs.borland.com



Back to top
John Kaster (Borland)
Guest





PostPosted: Fri Jul 23, 2004 5:04 am    Post subject: Re: Word DOC to HTML VCL? Reply with quote

Chris Cheah in <41005ff1$1 (AT) newsgroups (DOT) borland.com> wrote:

Quote:
Any hope of doing this in D7 now?

The HTML tidy wrapper Steve Trefethen used for D8 was written in
Delphi. I think it would work in Win32.

I believe it's this one: http://houston.quik.com/~jkp/tidypas/

--
John Kaster, Borland Developer Relations, http://bdn.borland.com
BorCon2004, all info in one place! http://info.borland.com/conf2004
Features and bugs: http://qc.borland.com
Get source: http://cc.borland.com
Unofficial information overload: http://blogs.borland.com

Back to top
Chris Cheah
Guest





PostPosted: Fri Jul 23, 2004 5:15 am    Post subject: Re: Word DOC to HTML VCL? Reply with quote

Many thanks John!
Much appreciation too. I hope this will strip out much of the MS crap from
the HTML!

Regards
Chris

"John Kaster (Borland)" <johnk (AT) borland (DOT) com> wrote

Quote:
Chris Cheah in <41005ff1$1 (AT) newsgroups (DOT) borland.com> wrote:

Any hope of doing this in D7 now?

The HTML tidy wrapper Steve Trefethen used for D8 was written in
Delphi. I think it would work in Win32.

I believe it's this one: http://houston.quik.com/~jkp/tidypas/

--
John Kaster, Borland Developer Relations, http://bdn.borland.com
BorCon2004, all info in one place! http://info.borland.com/conf2004
Features and bugs: http://qc.borland.com
Get source: http://cc.borland.com
Unofficial information overload: http://blogs.borland.com



Back to top
John Kaster (Borland)
Guest





PostPosted: Fri Jul 23, 2004 5:41 pm    Post subject: Re: Word DOC to HTML VCL? Reply with quote

Chris Cheah in <41009ed7$1 (AT) newsgroups (DOT) borland.com> wrote:

Quote:
Much appreciation too. I hope this will strip out much of the MS crap
from the HTML!

It certainly does help. Just not sure it's enough.

I'm facing exactly this issue for BDN, and I need to come up with an
automated solution for it as well. If it's something I can provide to
others (looks like it may be using some of our proprietary IDE
technology at this point) I'll make it available. Otherwise, I'll
provide the pieces I can.


--
John Kaster, Borland Developer Relations, http://bdn.borland.com
BorCon2004, all info in one place! http://info.borland.com/conf2004
Features and bugs: http://qc.borland.com
Get source: http://cc.borland.com
Unofficial information overload: http://blogs.borland.com

Back to top
Adrian Gallero
Guest





PostPosted: Fri Jul 23, 2004 9:00 pm    Post subject: Re: Word DOC to HTML VCL? Reply with quote

Chris Cheah wrote:

Quote:
John:

Thanks for the pointer. Yes it would be uselful ... only if I can
include the units in D7 and recompiled into my code without any
manual intervention; something like reading a TStream and parsing the
data.

Any hope of doing this in D7 now?

Thanks

Chris
"John Kaster (Borland)" <johnk (AT) borland (DOT) com> wrote in message
news:41005d82$1 (AT) newsgroups (DOT) borland.com...
Chris Cheah in <41004aae$1 (AT) newsgroups (DOT) borland.com> wrote:

Can anyone recommend a good VCL for D7/8 that can read a Word Doc
file and convert to plain HTML, stripping the uncessary MS
formatting tags?

I would hazard a guess that anyone doing this would be automating
word for the initial export. Then writing an HTML parser to strip
out the unnecessary stuff would be the next step.

Here's one non-automated way to do this:
http://blogs.borland.com/johnk/archive/2004/06/04/10.aspx

Hi,

To strip the HTMl, you can also try the microsoft plugin for office, at
http://www.microsoft.com/downloads/details.aspx?FamilyID=209ADBEE-3FBD-4
82C-83B0-96FB79B74DED&displaylang=EN

I used it long time ago and wasn't very impressed with the savings, but
it might be another option.

Regards,
Adrian.

Back to top
Mike Shkolnik
Guest





PostPosted: Fri Jul 23, 2004 9:02 pm    Post subject: Re: Word DOC to HTML VCL? Reply with quote

Chris,

as I answered to you privately today, our TSMWordDocument will do what you
need but without any formating (colors/fonts etc).

Only plain text could be extracted from any doc-file (without installed MS
Word, of course)
In the next version the formatting will be supported too but now only plain
text is there

--
With best regards, Mike Shkolnik
E-mail: [email]mshkolnik (AT) scalabium (DOT) com[/email]
WEB: http://www.scalabium.com

"Chris Cheah" <chris (AT) infocards (DOT) com> wrote

Quote:
Hello:

Can anyone recommend a good VCL for D7/8 that can read a Word Doc file and
convert to plain HTML, stripping the uncessary MS formatting tags?

Thanks

Chris





Back to top
Chris Cheah
Guest





PostPosted: Sat Jul 24, 2004 3:14 am    Post subject: Re: Word DOC to HTML VCL? Reply with quote

Hi Mike:

Yes please keep me posted when you can retain at least the color/font and
some rich text properties.

Regards

Chris
"Mike Shkolnik" <mshkolnik2002 (AT) ukr (DOT) net> wrote

Quote:
Chris,

as I answered to you privately today, our TSMWordDocument will do what you
need but without any formating (colors/fonts etc).

Only plain text could be extracted from any doc-file (without installed MS
Word, of course)
In the next version the formatting will be supported too but now only
plain
text is there

--
With best regards, Mike Shkolnik
E-mail: [email]mshkolnik (AT) scalabium (DOT) com[/email]
WEB: http://www.scalabium.com

"Chris Cheah" <chris (AT) infocards (DOT) com> wrote in message
news:41004aae$1 (AT) newsgroups (DOT) borland.com...
Hello:

Can anyone recommend a good VCL for D7/8 that can read a Word Doc file
and
convert to plain HTML, stripping the uncessary MS formatting tags?

Thanks

Chris







Back to top
John Kaster (Borland)
Guest





PostPosted: Sat Jul 24, 2004 6:26 am    Post subject: Re: Word DOC to HTML VCL? Reply with quote

Adrian Gallero in <41017c57 (AT) newsgroups (DOT) borland.com> wrote:

Quote:
To strip the HTMl, you can also try the microsoft plugin for office,

I think this is the "Filtered HTML" output option in Office 2003

Quote:
I used it long time ago and wasn't very impressed with the savings,
but it might be another option.

And you're right, there's definitely more to be stripped (and that
Delphi 8 will strip) than that filter ends up stripping.


--
John Kaster, Borland Developer Relations, http://bdn.borland.com
BorCon2004, all info in one place! http://info.borland.com/conf2004
Features and bugs: http://qc.borland.com
Get source: http://cc.borland.com
Unofficial information overload: http://blogs.borland.com

Back to top
John Kaster (Borland)
Guest





PostPosted: Sat Jul 24, 2004 6:27 am    Post subject: Re: Word DOC to HTML VCL? Reply with quote

Chris Cheah in <4101d40f$1 (AT) newsgroups (DOT) borland.com> wrote:

Quote:
Yes please keep me posted when you can retain at least the color/font
and some rich text properties.

If you're interested in going the rich text route, you could export as
rich text and parse that more easily. I know rich text parsers are more
common, but I haven't looked at them since about 1996.


--
John Kaster, Borland Developer Relations, http://bdn.borland.com
BorCon2004, all info in one place! http://info.borland.com/conf2004
Features and bugs: http://qc.borland.com
Get source: http://cc.borland.com
Unofficial information overload: http://blogs.borland.com

Back to top
Display posts from previous:   
Post new topic   Reply to topic    BorlandTalk.com Forum Index -> Delphi Non-Technical All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2006 phpBB Group
SEO toolkit © 2004-2006 webmedic.