Tuesday, March 24, 2009

I was going to give you the whole thing, but the first part is really boring. So I'll give you the more interesting section.

The difference between OPACs and the online search engine Google has a lot to do with who is doing the cataloging. With the OPAC, the cataloging, cross-referencing, and other important parts of the bibliographic record are all added manually by a cataloger. It’s a time consuming and meticulous process that ensures that a majority of what is asked of the OPAC will come back with a response, even if it’s one that tells the user to look somewhere else or rephrase the query. According to the article “How does Google collect and rank results,” the cataloging is mostly done by “spiders” or computer programs that troll the web, asking servers to give information about the websites they are hosting. These spiders then use the websites they have and catalog the pages via certain words. The article gives the example of civil and war. Of 100 documents or pages, perhaps thirteen have the word civil in them, another 24 have the word war in them and only 8 have both the words civil and war within the document. Of all 100 of those documents, the 8 that have the words civil and war in them are the most relevant. Google also uses a program called PageRank that looks at all of the indexed sites and further ranks the pages by looking at where the query words come up within the site and how many times the query term shows up. For example, if Jungle Cats is the search topic, a website with the term Jungle Cats in the title of the website is probably more relevant than one that only says Jungle Cats once. It also assigns rank by what is hyperlinked and to the website and from where. For instance, if Cnn.com links to an article or website about tax returns, it would be more relevant than if it weren’t’ linked at all or even if a similar site were linked to from three unknown sites.
The other HUGE difference between Google and the OPAC is simply a numbers game. When a user looks for an article or a book on the Midland OPAC the most returns that might ever be dealt with maybe only number in the thousands, and that would be going for a very general category, like music, or an entire Dewey section. Google will search for a refined topic like heart disease in Thailand and return results numbering in the millions. One caveat to the millions of hits is that some will only show up because the word “heart” or “disease” will be in the document. Google also returns all of these hits within a matter of seconds. The article explains that Google cut the time down by dividing the index created by the spider bots to many computers. Instead of the user looking through one large database of information, the search engine will search through many smaller ones. Then all of the machines together give the results for the query.
Finally, Google does not rely on controlled vocabulary for its searches. Whereas an OPAC will have Library of Congress subject headings and an authority file with which to find author, series, and title placement, Google simply compiles search terms into a database. While Google’s method makes it easier for users to find their topics, it also opens up a large amount of unnecessary hits to a query. If a user were to search Google for heart attack, any document having “heart” or “attack” would eventually come up in the results. On the other hand, myocardial infarction is the LOC subject heading. Any document with that search phrase as well as a cross reference to documents including the words “heart attack” but not returns with just “heart” or “attack.” Controlled vocabulary allows users to make better searches, but it requires work on the part of the user to actively use the controlled vocabulary. That is the trade of with Google, ease at the expense of relevance.
Google is a remarkable tool to use for research. It makes searching the web easy and fast for the every day user. It makes quick work of the vast information available on the internet and returns queries to the user within seconds. The library OPAC, on the other hand, is a smaller database of information, but controlled vocabulary, authority files, and for the most part verifiable information are available in it. Neither OPAC nor Google is better than the other, but each can give a different piece of the search query pie to users

No comments:

Post a Comment