Re: Converting to text

Subject: Re: Converting to text
From: Eric Ray <ejray -at- raycomm -dot- com>
To: techwr-l <techwr-l -at- lists -dot- raycomm -dot- com>
Date: Tue, 22 Feb 2000 11:55:23 -0700


> I've asked this before, I'll ask it again (in hopes that someone has
> invented some new whiz-bang gizmo):

It's not new, but ...

> Does anyone know of a macro, tool, program, utility, potion, magic
spell,
> and/or sacrificial offering that will convert a Word document to
*readable*
> text?
>
> Readable is the key word here, folks. It's possible to simply "Save
> As=text", but when you try that on a document that has columns,
tables, or
> other formatting, you are, in a word, screwed. Another problem we've
> encountered with "Save As=text" is the unique and mystifying ability
of
> Word to take a 20 page document and turn it into one long line of
text--no
> paragraph breaks, no page breaks--nothing!

<SNIP>

> c'mon...share your knowledge...c'mon...please...show me how to do
the
> dance...

Save the Word doc as HTML (or convert to HTML via whatever means you
choose). It doesn't have to be great at all--just a rudimentary
conversion with all of the extraneous crap that Word tosses in will be
just fine.

Then, find a unix command line so you can use the lynx (character-
based)
Web browser. Use:
lynx -dump filename.html > filename.txt
You can dump multiple files or directories with this and tweak options
to get exactly what you need. For example, using this:
lynx -dump -nolist -image_links off -pseudo_inlines off \
-width 60 http://www.raycomm.com/techwhirl/ > testoutput.txt

I got this from the techwr-l homepage at
http://www.raycomm.com/techwhirl/
(despite the fairly complex table structure as well as lots of
images):

(Note that I scrunched the width for better emailing and just because
I
could.)


[ads.pl?page=08]-[ads.pl?page=08]

The Official TECHWR-L Web Site
[navigationbutton.gif] Home [navigationbutton.gif]
What's New [navigationbutton.gif] Subscriber
Central [navigationbutton.gif] Calendar
[navigationbutton.gif] Contractors
[navigationbutton.gif] Directory
[navigationbutton.gif] Topics
[navigationbutton.gif] Chat
[navigationbutton.gif] Archive
[navigationbutton.gif] Features
[navigationbutton.gif] Contact Us

[ads_side.pl?page=07]-[ads_side.pl?page=07]

[mini.gif] Subscriber Central

To subscribe
Enter and submit your email address:
_______________ Join

To explore options
Click here to unsubscribe, set nomail, receive
digests, search archives, or check your
subscription status.

Welcome to The Official TECHWR-L, the
award-winning Web site supporting the TECHWR-L
listserv list. TECHWR-L is an unmoderated
discussion forum for technical communication
topics. If youre a technical writer, editor,
indexer, teacher, student, or just someone
interested in technical communication topics, join
the list and benefit from over 4400 subscribers'
expertise, education, and experience.
* What's New: Find out what's new on the site
and register to receive an automatic email
notification every time this site is updated.
* Subscriber Central: Explore the one-stop
location where you can subscribe, unsubscribe,
see posting rules, set options, and check your
subscription status.
* Calendar: Find information about technical
writing related events and activities; post
your event or activity here, too!
* Contractors: Search the contractors database
or add your entry to the nearly 600
contractors already listed.
* Directory: Access this extensive resources
directory, which contains a wide variety of
technical communication books and Web sites.
* Topics: Browse summaries of topics already
discussed on TECHWR-L.
* Chat: Chat with other technical communicators
in this chat forum.
* Archive: Browse or search the complete
TECHWR-L archive.
* Features: Check out "TECHWR-L: Then and Now,"
which highlights people, topics, and trivia
from TECHWR-L's history.
* Contact Us: Contact the TECHWR-L list
listowner or this site's Webmaster.
_____________________________________________

Search the TECHWR-L Site _____________ Go!
Home | What's New | Subscriber Central | Calendar
Contractors | Directory | Topics | Chat
Archive | Features | Contact Us
Site Policies | Site Ad Info | List Ads and
Sponsorships Info

Last updated on 18 January, 2000
Site contents Copyright c 1997, 1998, 1999, 2000
RayComm, Inc.
Send comments to Deborah Ray.




Previous by Author: Re: Project estimation software
Next by Author: Re: Summary of Responses: Whirlers and Environments (Long)
Previous by Thread: RE: Converting to text
Next by Thread: RE: Converting to text


What this post helpful? Share it with friends and colleagues:


Sponsored Ads