Saturday, October 29, 2011

DITA Assignment 1

Information Retrieval (and subsequently the means and systems used to retrieve information) is a pervasive force in our lives. We deal with Information Retrieval on a day-to-day basis probably without even being aware of it most of the time. Do you need more information about a topic? Want to find out when your favorite band’s new CD is being released? We use Information Retrieval Systems every day to find out more about a variety of topics.

Given that we utilize it so often, it is not surprising that we do not really appreciate the underlying complexity of Information Retrieval and its theoretical components. Jansen and Rieh (2010) identify no less than seventeen theoretical constructs of this field. They note that, “as a field of study, Information Retrieval is well established, with its own conferences and journals focused exclusively on Information Retrieval research” (Jansen and Rieh 2010, pg. 1517).

What exactly is Information Retrieval? Jansen and Rieh define it as: “finding material of an unstructured nature that satisfies an information need from within large collections stored on computers” (2010, pg. 1517). A user first has a need for a certain piece of information and wishes to find this information. This can be done by submitting a query, which will match the users terms against any found on the computer or on the web.

Broder (2002), in his article “A Taxonomy of Web Search,” observes that there are three types of queries which users might have: informational queries, navigational queries, and transactional queries. He expands this idea by saying, “the need behind a web search is often not informational – it might be navigational (give me the url of the site I want to reach) or transactional (show me sites where I can perform a certain transaction, e.g. shop, download a file, or find a map)” (Broder 2002). Thus, one can see that information retrieval can be used to find many different bits of information. It stands to reason that the way one searches would change with each separate need. Indeed, Yuan and Belkin based an experiment and consequent journal article on Belkin’s idea that:
an information-seeking episode could be construed as a sequence of different types of interactions with information, or different ISSs [or information-seeking strategies], each of which could be ‘optimally’ supported by different combinations of various retrieval techniques. Thus, there would be different choices of such techniques for best support of any particular ISS. (Yuan & Belkin 2002, pg. 1544)
Users must therefore adapt their information retrieval techniques to better fit their specific need. More experienced users understand the importance of having a specific search plan but also being able to adapt that plan: in an experiment by Navarro-Prieto et al., novice users said they did not have a set plan for searching and could not explain the reason behind their actions or choices when searching, whereas experienced users did have a better understanding and approached queries with a plan (Navarro-Prieto et al. 1999) Success depends upon user planning and flexibility, and users who have a greater understanding for information retrieval exhibit these qualities more than novices. Hölscher and Strube (2000) similarly come to this same conclusion, also noting the flexible nature of the system. Each person, having a separate and distinct strategy, will get different results from the system: “participants follow different paths trying to solve given tasks and hardly ever face exactly the same pages of results or have to reformulate the exact same search queries as another participant” (Hölscher and Strube 2000). Thus, there are many factors that go into information retrieval and users will have different experiences depending on their level of experience, their search methods (including their choice of search engine), and their way of interacting with the information presented by the search engine.

How, then, should one approach a query and what methods should be employed? Holt and Miller (2009) state that, “more frequently occurring terms are less distinguishing than less frequently occurring terms. The inverse term frequency of the search terms can be used as weights to rank the document.” In other words, if one were to use less common words, one could more easily pinpoint the specific information required by the query. This is an extension of the fact that in queries, stop words (such as “the,” “an,” etc.) should be removed. The less common the word, the more specific the result is. Another method is to use stems, as Robertson and Jones (1997) discuss: “terms are generally stems (or roots), rather than full words, since this means that matches are not missed through trivial word variation, as with singular/plural forms.” This returns results using various forms of a word so that items are shown that would have been missed otherwise.

The exercises performed in Session 4 of the Digital Information Technologies and Architectures course provide a perfect example of how all of these factors can affect a user’s queries. In this exercise, students were asked to use Google and Bing search engines to find information on a list of 10 topics. Students were also asked to try a variety of different search methods in each of these searches (e.g. natural language queries, Boolean operator queries, phrases, and operators such as “+” and “–”) to see how these changed the results of the searches. Both search engines offered adequate results on most searches, although in some cases these results differed between the search engines whereas in other cases the results were much the same between the two. Using various types of queries sometimes changed the results; however, most of the time there was at least one or two pages in the results list that remained after the change in query type. It is interesting to note that when the same type of query was used in both Google and Bing, the precision sometimes differed noticeably between the two search engines. Precision, of course, being measured by the number of relevant documents retrieved within the total number of documents (Morville and Rosenfeld 2006, pg. 159), was in this exercise calculated by examining the first five results of each query. This exercise showed that, although we do not actively notice it, our search methods and the way in which we approach queries greatly affects our success rate with information retrieval. It also served as an exhibit of just how intricate these details are and how much we have been programmed not to really think about each step when approaching the query as a whole. We make search plans, re-evaluate these plans, and adapt our searches accordingly in a matter of seconds with no real effort. Perhaps these ingrained behaviors have made us all more “expert” at information retrieval.

-----------------------------------------------------------------------------------------
***The URL of this blog entry is: http://duchyinwonderland.blogspot.com/2011/10/dita-assignment-1.html ***

Bibliography

Hölscher, C., Strube, G., 2000. Web Search Behaviour of Internet Experts and Newbies. Available from: http://www9.org/w9cdrom/81/81.html [Accessed 26 October 2011].

Holt, J.D., Miller, D.J., 2009. An Evolution of Search. Bulletin of the American Society for Information Science & Technology, 6 (1), 11-15. Available from: http://www.ebscohost.com [Accessed 24 October 2011].

Jansen, B.J., Rieh, S.Y., 2010. The Seventeen Theoretical Constructs of Information Searching and Information Retrieval. Journal of the American Society for Information Science & Technology, 61 (8), 1517-1534. Available from: http://www.ebscohost.com [Accessed 24 October 2011].

Morville, P., Rosenfeld, L., 2006. Information Architecture for the World Wide Web: Designing Large-Scale Web Sites. 3rd ed. Cambridge: O’Reilly.

Navarro-Prieto, R., Scaife, M., & Rogers, Y., 1999. Cognitive Strategies in Web Searching. In: Proceedings of the 5th Conference on Human Factors & the Web, 1999. Available from: http://zing.ncsl.nist.gov/hfweb/proceedings/navarro-prieto/index.html. [Accessed 26 October 2011].

Robertson, S.E., Jones, K.S., 1997. Simple, Proven Approaches to Text Retrieval, University of Cambridge Technical Note, TR356. Available from: http://moodle.city.ac.uk [Accessed 25 October 2011].

Yuan, X., Belkin, N. J., 2010. Investigating Information Retrieval Support Techniques for Different Information-Seeking Strategies. Journal of the American Society for Information Science & Technology, 61 (8), 1543-1563. Available from: http://www.ebscohost.com [Accessed 24 October 2011].

No comments:

Post a Comment