 |
BorlandTalk.com Borland discussion newsgroups
|
| View previous topic :: View next topic |
| Author |
Message |
Chris Cheah Guest
|
Posted: Thu Jul 22, 2004 11:15 pm Post subject: Word DOC to HTML VCL? |
|
|
Hello:
Can anyone recommend a good VCL for D7/8 that can read a Word Doc file and
convert to plain HTML, stripping the uncessary MS formatting tags?
Thanks
Chris
|
|
| Back to top |
|
 |
John Kaster (Borland) Guest
|
|
| Back to top |
|
 |
Chris Cheah Guest
|
Posted: Fri Jul 23, 2004 12:46 am Post subject: Re: Word DOC to HTML VCL? |
|
|
John:
Thanks for the pointer. Yes it would be uselful ... only if I can include
the units in D7 and recompiled into my code without any manual intervention;
something like reading a TStream and parsing the data.
Any hope of doing this in D7 now?
Thanks
Chris
"John Kaster (Borland)" <johnk (AT) borland (DOT) com> wrote
|
|
| Back to top |
|
 |
John Kaster (Borland) Guest
|
|
| Back to top |
|
 |
Chris Cheah Guest
|
Posted: Fri Jul 23, 2004 5:15 am Post subject: Re: Word DOC to HTML VCL? |
|
|
Many thanks John!
Much appreciation too. I hope this will strip out much of the MS crap from
the HTML!
Regards
Chris
"John Kaster (Borland)" <johnk (AT) borland (DOT) com> wrote
|
|
| Back to top |
|
 |
John Kaster (Borland) Guest
|
Posted: Fri Jul 23, 2004 5:41 pm Post subject: Re: Word DOC to HTML VCL? |
|
|
Chris Cheah in <41009ed7$1 (AT) newsgroups (DOT) borland.com> wrote:
| Quote: | Much appreciation too. I hope this will strip out much of the MS crap
from the HTML!
|
It certainly does help. Just not sure it's enough.
I'm facing exactly this issue for BDN, and I need to come up with an
automated solution for it as well. If it's something I can provide to
others (looks like it may be using some of our proprietary IDE
technology at this point) I'll make it available. Otherwise, I'll
provide the pieces I can.
--
John Kaster, Borland Developer Relations, http://bdn.borland.com
BorCon2004, all info in one place! http://info.borland.com/conf2004
Features and bugs: http://qc.borland.com
Get source: http://cc.borland.com
Unofficial information overload: http://blogs.borland.com
|
|
| Back to top |
|
 |
Adrian Gallero Guest
|
Posted: Fri Jul 23, 2004 9:00 pm Post subject: Re: Word DOC to HTML VCL? |
|
|
Chris Cheah wrote:
| Quote: | John:
Thanks for the pointer. Yes it would be uselful ... only if I can
include the units in D7 and recompiled into my code without any
manual intervention; something like reading a TStream and parsing the
data.
Any hope of doing this in D7 now?
Thanks
Chris
"John Kaster (Borland)" <johnk (AT) borland (DOT) com> wrote in message
news:41005d82$1 (AT) newsgroups (DOT) borland.com...
Chris Cheah in <41004aae$1 (AT) newsgroups (DOT) borland.com> wrote:
Can anyone recommend a good VCL for D7/8 that can read a Word Doc
file and convert to plain HTML, stripping the uncessary MS
formatting tags?
I would hazard a guess that anyone doing this would be automating
word for the initial export. Then writing an HTML parser to strip
out the unnecessary stuff would be the next step.
Here's one non-automated way to do this:
http://blogs.borland.com/johnk/archive/2004/06/04/10.aspx
|
Hi,
To strip the HTMl, you can also try the microsoft plugin for office, at
http://www.microsoft.com/downloads/details.aspx?FamilyID=209ADBEE-3FBD-4
82C-83B0-96FB79B74DED&displaylang=EN
I used it long time ago and wasn't very impressed with the savings, but
it might be another option.
Regards,
Adrian.
|
|
| Back to top |
|
 |
Mike Shkolnik Guest
|
Posted: Fri Jul 23, 2004 9:02 pm Post subject: Re: Word DOC to HTML VCL? |
|
|
Chris,
as I answered to you privately today, our TSMWordDocument will do what you
need but without any formating (colors/fonts etc).
Only plain text could be extracted from any doc-file (without installed MS
Word, of course)
In the next version the formatting will be supported too but now only plain
text is there
--
With best regards, Mike Shkolnik
E-mail: [email]mshkolnik (AT) scalabium (DOT) com[/email]
WEB: http://www.scalabium.com
"Chris Cheah" <chris (AT) infocards (DOT) com> wrote
| Quote: | Hello:
Can anyone recommend a good VCL for D7/8 that can read a Word Doc file and
convert to plain HTML, stripping the uncessary MS formatting tags?
Thanks
Chris
|
|
|
| Back to top |
|
 |
Chris Cheah Guest
|
Posted: Sat Jul 24, 2004 3:14 am Post subject: Re: Word DOC to HTML VCL? |
|
|
Hi Mike:
Yes please keep me posted when you can retain at least the color/font and
some rich text properties.
Regards
Chris
"Mike Shkolnik" <mshkolnik2002 (AT) ukr (DOT) net> wrote
| Quote: | Chris,
as I answered to you privately today, our TSMWordDocument will do what you
need but without any formating (colors/fonts etc).
Only plain text could be extracted from any doc-file (without installed MS
Word, of course)
In the next version the formatting will be supported too but now only
plain
text is there
--
With best regards, Mike Shkolnik
E-mail: [email]mshkolnik (AT) scalabium (DOT) com[/email]
WEB: http://www.scalabium.com
"Chris Cheah" <chris (AT) infocards (DOT) com> wrote in message
news:41004aae$1 (AT) newsgroups (DOT) borland.com...
Hello:
Can anyone recommend a good VCL for D7/8 that can read a Word Doc file
and
convert to plain HTML, stripping the uncessary MS formatting tags?
Thanks
Chris
|
|
|
| Back to top |
|
 |
John Kaster (Borland) Guest
|
Posted: Sat Jul 24, 2004 6:26 am Post subject: Re: Word DOC to HTML VCL? |
|
|
Adrian Gallero in <41017c57 (AT) newsgroups (DOT) borland.com> wrote:
| Quote: | To strip the HTMl, you can also try the microsoft plugin for office,
|
I think this is the "Filtered HTML" output option in Office 2003
| Quote: | I used it long time ago and wasn't very impressed with the savings,
but it might be another option.
|
And you're right, there's definitely more to be stripped (and that
Delphi 8 will strip) than that filter ends up stripping.
--
John Kaster, Borland Developer Relations, http://bdn.borland.com
BorCon2004, all info in one place! http://info.borland.com/conf2004
Features and bugs: http://qc.borland.com
Get source: http://cc.borland.com
Unofficial information overload: http://blogs.borland.com
|
|
| Back to top |
|
 |
John Kaster (Borland) Guest
|
Posted: Sat Jul 24, 2004 6:27 am Post subject: Re: Word DOC to HTML VCL? |
|
|
Chris Cheah in <4101d40f$1 (AT) newsgroups (DOT) borland.com> wrote:
| Quote: | Yes please keep me posted when you can retain at least the color/font
and some rich text properties.
|
If you're interested in going the rich text route, you could export as
rich text and parse that more easily. I know rich text parsers are more
common, but I haven't looked at them since about 1996.
--
John Kaster, Borland Developer Relations, http://bdn.borland.com
BorCon2004, all info in one place! http://info.borland.com/conf2004
Features and bugs: http://qc.borland.com
Get source: http://cc.borland.com
Unofficial information overload: http://blogs.borland.com
|
|
| Back to top |
|
 |
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|