TechWhirl (TECHWR-L) is a resource for technical writing and technical communications professionals of all experience levels and in all industries to share their experiences and acquire information.
For two decades, technical communicators have turned to TechWhirl to ask and answer questions about the always-changing world of technical communications, such as tools, skills, career paths, methodologies, and emerging industries. The TechWhirl Archives and magazine, created for, by and about technical writers, offer a wealth of knowledge to everyone with an interest in any aspect of technical communications.
Subject:Re: Reasons for online From:Ray Bruman <rbruman -at- RND -dot- RAYNET -dot- COM> Date:Tue, 20 Dec 1994 14:21:01 PST
> > > And remember, all the good index
es
> > > in the world won't help anyone who doesn't know what he's looking for.
> >
> > But Glen, that's the point. We want to index things that are *not*
> > in the text. For instance, in my latest on-line help system, we have
> > topics referring to "building a project". For the new user, they may
> > think of "building a project" as entering, starting, creating, or making
> > a new project. Not all of these words can be found in the text.
> > Only a carefully crafted index can answer such a need.
> >
> I disagree completely. Only a stupid person would fail to find the answer.
> I would love to see some substantial (worthy of indexing) text about building
> a project which never uses the words "build" or "project." If the text is
> about "creating" a project, I just don't see the need for someone to waste
> the amount of time necessary to index everything similar to "create." It is
> up to the searcher to find, and we as writers can help, but we cannot handle
> the case of unmotivated idiots. My advise is -- don't bother.
> If I can't find something by looking at every word -- what I want ain't there.
That last sentence sums up the case for full-text retrieval in place of
manual indexing. It's a pithy statement, with some appeal. As it turns
out, it's surprisingly wrong.
I've seen one research report (sorry I can't give a citation right now)
on a $500,000 experiment with full-text retrieval on an enormous corpus
of legal material, all pertaining to one case. A big case, worth spending
this money on. They thought they would achieve at least 80% to 90% hit
rate, finding relevant documents using keyword searching on the entire,
enormous corpus. Every single word had been entered in machine-readable
form. They were astounded to find they could only get a 20% (approx.)
hit rate. Language use is much more vague and indirect than we realize.
The "Indexing Problem" has been around ever since there were libraries,
and there is an enormous literature on the subject.
Of course, dear reader, you knew that.
Or else, you don't care to look into it.
Who am I writing to, and why?
Ray Bruman In this establishment,
Raynet Corp. we DO NOT DISCUSS
rbruman -at- raynet -dot- com race, religion, politics,
415-688-2325 or nutrition.