Re: Reasons for online

Subject: Re: Reasons for online
From: Ray Bruman <rbruman -at- RND -dot- RAYNET -dot- COM>
Date: Tue, 20 Dec 1994 14:21:01 PST

> > > And remember, all the good index
es
> > > in the world won't help anyone who doesn't know what he's looking for.
> >
> > But Glen, that's the point. We want to index things that are *not*
> > in the text. For instance, in my latest on-line help system, we have
> > topics referring to "building a project". For the new user, they may
> > think of "building a project" as entering, starting, creating, or making
> > a new project. Not all of these words can be found in the text.
> > Only a carefully crafted index can answer such a need.
> >

> I disagree completely. Only a stupid person would fail to find the answer.
> I would love to see some substantial (worthy of indexing) text about building
> a project which never uses the words "build" or "project." If the text is
> about "creating" a project, I just don't see the need for someone to waste
> the amount of time necessary to index everything similar to "create." It is
> up to the searcher to find, and we as writers can help, but we cannot handle
> the case of unmotivated idiots. My advise is -- don't bother.

> If I can't find something by looking at every word -- what I want ain't there.

> ------------
> glen accardo glen -at- softint -dot- com
> Software Interfaces, Inc. (713) 492-0707 x122
> Houston, TX 77084


That last sentence sums up the case for full-text retrieval in place of
manual indexing. It's a pithy statement, with some appeal. As it turns
out, it's surprisingly wrong.

I've seen one research report (sorry I can't give a citation right now)
on a $500,000 experiment with full-text retrieval on an enormous corpus
of legal material, all pertaining to one case. A big case, worth spending
this money on. They thought they would achieve at least 80% to 90% hit
rate, finding relevant documents using keyword searching on the entire,
enormous corpus. Every single word had been entered in machine-readable
form. They were astounded to find they could only get a 20% (approx.)
hit rate. Language use is much more vague and indirect than we realize.

The "Indexing Problem" has been around ever since there were libraries,
and there is an enormous literature on the subject.

Of course, dear reader, you knew that.
Or else, you don't care to look into it.

Who am I writing to, and why?

Ray Bruman In this establishment,
Raynet Corp. we DO NOT DISCUSS
rbruman -at- raynet -dot- com race, religion, politics,
415-688-2325 or nutrition.


Previous by Author: Indexing in FrameMaker
Next by Author: Disciplines within Tech Comm
Previous by Thread: Re: Reasons for online
Next by Thread: Re: Reasons for online


What this post helpful? Share it with friends and colleagues:


Sponsored Ads