Re:hapax legomena-driven retrieval

Anders Sandberg (nv91-asa@nada.kth.se)
Fri, 25 Jul 1997 11:58:22 +0200 (MET DST)


On Thu, 24 Jul 1997, Andrea Gallagher wrote:

> At 11:02 AM 7/24/97 -0400, Carl Feynman wrote:
> >This might be called the 'hapax legomena strategy'. (A hapax legomena is a
> >word or phrase that occurs only once in a given corpus).
>
> Aarrrgh! It's true. I have also been doing this, which explains why I can
> only find out information about topics I already know. For those areas I'm
> not familiar with, bizarre buzzwords never set in my memory. I don't even
> think I read them: "...crossed the radish, Rmpphmmm smmtmvm, with the
> cabbage, Brmmphmm olmmlmmm, trying to produce...".

Yes, it is amazing how hard it is to search for unfamiliar stuff. But
once you know a little, it becomes easy. Just another reason to get a
braod education - read everything you can get your hands on!

> I suppose it's an improvement that you need to be a domain expert to search
> effectively, instead of having to be a search system expert. It's
> interesting that we're seeing a proliferation of guide services (The Mining
> Company, Netguide, Excite & Yahoo, the Subject Clearinghouse), where you
> only need to find one site on a topic to learn the basics & the buzzwords.
> A nice split in information access methods between experts and novices.

The next step is of course that you try to describe what you are
looking for, and automatic thesaurus generators list buzzwords/hapax
legomena that you might want to search for (or do a search for each
of these, showing only the results which fit your descriptions). I
just read about one such system that automagically generated a
thesaurus of terms related to Drosophilia Melanogaster from a set of
research databases. Of course, one can use "find similar sites"
functions too.

-----------------------------------------------------------------------
Anders Sandberg Towards Ascension!
nv91-asa@nada.kth.se http://www.nada.kth.se/~nv91-asa/main.html
GCS/M/S/O d++ -p+ c++++ !l u+ e++ m++ s+/+ n--- h+/* f+ g+ w++ t+ r+ !y