Wednesday, April 15, 2009

Authority control and controlled vocabulary

In the age of instant gratification, the information agencies that house a vast majority of the world’s authoritative information are seeing less action. Thank in part to Google, patrons of libraries can now find their information on the internet with the help of keyword searching and a very sensitive mathematical algorithm to rank websites. Libraries, being based formerly within the wooden alcoves of card catalogs, are fantastically put together in terms of access points like author, title, and subject. In fact, the Library of Congress (LOC) regularly puts out thousands of new subject headings a year. It is through these headings that information can be essentially “tagged” with a controlled vocabulary word to access it. Cookery for cookbooks and Myocardial infarction for heart attacks are just a few examples within the LOC subject headings. These are certainly not the only subject headings either. However, the simpler task is to search with a keyword. The five articles I described in the annotated bibliography below all deal in a fundamental way with authority control or controlled vocabulary. Each makes the argument that the way of doing business in the information retrieval game is dependent upon authority control. The one ring to rule them all.

First, we’ll look to controlled vocabulary and it’s place within the OPAC. Controlled vocabulary is essentially an agreed upon list of words and phrases that are used within the catalog in order to steam line the search process and ensure that all bibliographic records for a title are uniform. The most important place for controlled vocabulary is the subject heading. It is within the subject heading that books, articles, videos, songs, sheet music, art and any other piece of information can be linked together with other information of similar description. Music by Bach and Beethoven would be found together in classical music, Signs, Unbreakable, and The Village would be found under bad movies. The controlled vocabulary dictates what words are used to describe the information. In “Controlled vocabularies: implementation and evaluation,” Marshall shows how to use a controlled vocabulary not only with an online card catalog, but also in any instance that controlled vocabulary is used. She explains that even if attaching a controlled vocabulary to a full text search would involve false hits. In fact, if one gives “consideration to defining the environment” with regard to what exactly the controlled vocabulary will search “will yield more precise results” (Marshall 2006 p. 55). In order for controlled vocabulary to work, there needs to be a framework in which it can be contextualized. Thankfully, the authority control gives us controlled vocabulary a place to roam.

Authority control will probably start a war someday. Not really, but the kinds of decisions the people in charge of authority control have to make cause countless people to get upset. In recent events, the LOC stopped producing authority files for series. While this doesn’t seem like much, many libraries rely on the LOC to give them a sense of direction. In an article by Mirna Willer, the dealings of the international authority body IFLA are called into question. She ask whether we are ready to see authority control reach new heights in the new environments opening up to it or sit back and wait for information technology to do the work for us later (2006 p. 56). The new heights and new environments she’s talking about are authority control taking part in art galleries, museums and archives, in seeing the expansion of authority control to encompass more information than it has before. She leaves her question unanswered, but also leaves a stinging barb with the authority community regarding their unwillingness to take up this task.

Finally, the happy marriage between controlled vocabulary and authority control creates the OPAC. The online public access catalog is the life and blood of the modern library. Searches live and die by how well information can be accessed on these system. What three of the articles show is that there is as much a need for authority control and controlled vocabulary now as ever there was. Gross and Taylor perform a study to look at the ability of keyword searching within the OPAC’s holdings to net good results if those keywords were not reliant upon the subject headings. (Subject headings, remember, are the happy union between a controlled vocabulary and an authority to determine that a particular topic will be labeled with that particular controlled term.) Gross and Taylor discover that as much as 36 percent of the returned information would not have been if it weren’t for the subject heading matching to a subject heading or a cross reference to a subject heading. When taking into account foreign language materials, the percentage jumps to as much as 100 percent in some cases. (2005 p. 216, 223). Thomas Mann’s articles show a similar argument; only Mann provided detailed search examples based on actually user queries. In his first article, Mann discusses the use of Google’s keyword searching with Google Paper, an online repository of digital books. Mann walks us step by step through how searching by keyword and searching by subject term yield different results, weighing that searches conducted with subject headings are far more appropriate than those without. Mann’s second article takes that small demonstration and creates a 30 page walk through on how to take a topic and break it down, dummy it up, look in different places and look along the same lines (Mann 2007 p. 11-13). Each of his techniques involves modifying the use of controlled vocabulary and relying upon the authority of the system to find relevant materials and not the hope that a keyword will catch the right set of information. It is because of the backbone of authority control and controlled vocabulary that Mann is able to better attend to the needs of his patrons, not the unfeeling machine giant Google and its keyword ranking system.

What remains to be seen is what happens next. While it is certain that the authority control and controlled vocabulary are certainly a better means of searching for information, Google and search engines like it are here to stay and they are completing directly for the hearts and minds of today’s scholars and today’s youth. Will the OPAC grow to incorporate the one time seamless search many of us are used to, or perhaps the search engine giants will take on their own controlled vocabulary and become an authority in and of themselves. But like Willer asked, who will be at the forefront of this new move to innovation with the enemy? Who will move first, and who will survive when the smoke clears? We shall all have to wait and see.

No comments:

Post a Comment