Re: Word 2000 HTML conversion

Subject: Re: Word 2000 HTML conversion
From: Ray Dembek <RFDembek -at- MEDIAONE -dot- NET>
Date: Wed, 18 Aug 1999 12:18:06 -0400

You can download the Microsoft Office HTML Filter, a tool you can use to
remove Office-specific markup tags embedded in Office 2000 documents saved
as HTML. See
http://officeupdate.microsoft.com/2000/downloadDetails/htmlfilter.htm.
Installing this filter will also implement an Export to HTML command on the
File menu in Word 2000.

The HTML code out of this is not "minimal" but it is a lot cleaner than when
you save a Word doc as a web page. The "Hello, World" test now results in a
5K HTML document with still more than I want.

The minimal document would contain:

<html>
<head>
<title>Hello, World</title>
</head>
<body>
<p>Hello, World</p>
</body>
</html>

The 5K HTML from the HTML filter is:

<html>
<head>
<meta name=Generator content="Microsoft Office HTML Filter">
<meta http-equiv=Content-Type content="text/html; charset=windows-1252">
<meta name=Originator content="Microsoft Word 9">
<title>Hello, World</title>
<style>
<!--

p.MsoNormal, li.MsoNormal, div.MsoNormal
{
margin:0in;
margin-bottom:.0001pt;
font-size:10.0pt;
font-family:"Times New Roman";}
-->
</style>
</head>

<body lang=EN-US>

<div class=Section1>

<p class=MsoNormal>Hello, World</p>

</div>

</body>

</html>

-----Original Message-----
From: Technical Writers List; for all Technical Communication issues
[mailto:TECHWR-L -at- LISTSERV -dot- OKSTATE -dot- EDU]On Behalf Of Eric J. Ray
Sent: Wednesday, August 18, 1999 11:22 AM
To: TECHWR-L -at- LISTSERV -dot- OKSTATE -dot- EDU
Subject: Re: Word 2000 HTML conversion

> Using Word 97 to convert a Word doc to HTML results in a lot of extraneous
> tags and a lot of manual clean-up. Someone told me here that Word 2000
> produces a tighter, cleaner file.
>
> Has anyone out there tried this? Confirm? Deny?

It's worse. Because Microsoft claims that you can round-trip
files from Word to HTML and back to Word without losing
formatting, they've had to add a ton of XML (pseudo XML,
actually, with a lot of non-standard namespace issues)
to each HTML document. When tech editing a new book on
MS Word 2000, I tested it with the classic "Hello, World"
and nothing else visible in the Word file, and came up with
just about 100 lines of code in the HTML document.

It might be a non-issue on an intranet, but you've got
enough bloat to be a serious issue on the Internet.

Eric

--
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Eric J. Ray ejray -at- raycomm -dot- com
UNIX Visual QuickStart Guide is "a superb book!"
Don't believe it? Check for yourself!
Find out more at http://www.raycomm.com/

==============================================

From ??? -at- ??? Sun Jan 00 00:00:00 0000=

From ??? -at- ??? Sun Jan 00 00:00:00 0000=


Previous by Author: Re: Marketing block
Next by Author: Frame to PDF (And Other Questions)
Previous by Thread: Re: Word 2000 HTML conversion
Next by Thread: Re: Word 2000 HTML conversion


What this post helpful? Share it with friends and colleagues:


Sponsored Ads