Saturday, October 29, 2011

DITA Assignment 1

Information Retrieval (and subsequently the means and systems used to retrieve information) is a pervasive force in our lives. We deal with Information Retrieval on a day-to-day basis probably without even being aware of it most of the time. Do you need more information about a topic? Want to find out when your favorite band’s new CD is being released? We use Information Retrieval Systems every day to find out more about a variety of topics.

Given that we utilize it so often, it is not surprising that we do not really appreciate the underlying complexity of Information Retrieval and its theoretical components. Jansen and Rieh (2010) identify no less than seventeen theoretical constructs of this field. They note that, “as a field of study, Information Retrieval is well established, with its own conferences and journals focused exclusively on Information Retrieval research” (Jansen and Rieh 2010, pg. 1517).

What exactly is Information Retrieval? Jansen and Rieh define it as: “finding material of an unstructured nature that satisfies an information need from within large collections stored on computers” (2010, pg. 1517). A user first has a need for a certain piece of information and wishes to find this information. This can be done by submitting a query, which will match the users terms against any found on the computer or on the web.

Broder (2002), in his article “A Taxonomy of Web Search,” observes that there are three types of queries which users might have: informational queries, navigational queries, and transactional queries. He expands this idea by saying, “the need behind a web search is often not informational – it might be navigational (give me the url of the site I want to reach) or transactional (show me sites where I can perform a certain transaction, e.g. shop, download a file, or find a map)” (Broder 2002). Thus, one can see that information retrieval can be used to find many different bits of information. It stands to reason that the way one searches would change with each separate need. Indeed, Yuan and Belkin based an experiment and consequent journal article on Belkin’s idea that:
an information-seeking episode could be construed as a sequence of different types of interactions with information, or different ISSs [or information-seeking strategies], each of which could be ‘optimally’ supported by different combinations of various retrieval techniques. Thus, there would be different choices of such techniques for best support of any particular ISS. (Yuan & Belkin 2002, pg. 1544)
Users must therefore adapt their information retrieval techniques to better fit their specific need. More experienced users understand the importance of having a specific search plan but also being able to adapt that plan: in an experiment by Navarro-Prieto et al., novice users said they did not have a set plan for searching and could not explain the reason behind their actions or choices when searching, whereas experienced users did have a better understanding and approached queries with a plan (Navarro-Prieto et al. 1999) Success depends upon user planning and flexibility, and users who have a greater understanding for information retrieval exhibit these qualities more than novices. Hölscher and Strube (2000) similarly come to this same conclusion, also noting the flexible nature of the system. Each person, having a separate and distinct strategy, will get different results from the system: “participants follow different paths trying to solve given tasks and hardly ever face exactly the same pages of results or have to reformulate the exact same search queries as another participant” (Hölscher and Strube 2000). Thus, there are many factors that go into information retrieval and users will have different experiences depending on their level of experience, their search methods (including their choice of search engine), and their way of interacting with the information presented by the search engine.

How, then, should one approach a query and what methods should be employed? Holt and Miller (2009) state that, “more frequently occurring terms are less distinguishing than less frequently occurring terms. The inverse term frequency of the search terms can be used as weights to rank the document.” In other words, if one were to use less common words, one could more easily pinpoint the specific information required by the query. This is an extension of the fact that in queries, stop words (such as “the,” “an,” etc.) should be removed. The less common the word, the more specific the result is. Another method is to use stems, as Robertson and Jones (1997) discuss: “terms are generally stems (or roots), rather than full words, since this means that matches are not missed through trivial word variation, as with singular/plural forms.” This returns results using various forms of a word so that items are shown that would have been missed otherwise.

The exercises performed in Session 4 of the Digital Information Technologies and Architectures course provide a perfect example of how all of these factors can affect a user’s queries. In this exercise, students were asked to use Google and Bing search engines to find information on a list of 10 topics. Students were also asked to try a variety of different search methods in each of these searches (e.g. natural language queries, Boolean operator queries, phrases, and operators such as “+” and “–”) to see how these changed the results of the searches. Both search engines offered adequate results on most searches, although in some cases these results differed between the search engines whereas in other cases the results were much the same between the two. Using various types of queries sometimes changed the results; however, most of the time there was at least one or two pages in the results list that remained after the change in query type. It is interesting to note that when the same type of query was used in both Google and Bing, the precision sometimes differed noticeably between the two search engines. Precision, of course, being measured by the number of relevant documents retrieved within the total number of documents (Morville and Rosenfeld 2006, pg. 159), was in this exercise calculated by examining the first five results of each query. This exercise showed that, although we do not actively notice it, our search methods and the way in which we approach queries greatly affects our success rate with information retrieval. It also served as an exhibit of just how intricate these details are and how much we have been programmed not to really think about each step when approaching the query as a whole. We make search plans, re-evaluate these plans, and adapt our searches accordingly in a matter of seconds with no real effort. Perhaps these ingrained behaviors have made us all more “expert” at information retrieval.

-----------------------------------------------------------------------------------------
***The URL of this blog entry is: http://duchyinwonderland.blogspot.com/2011/10/dita-assignment-1.html ***

Bibliography

Hölscher, C., Strube, G., 2000. Web Search Behaviour of Internet Experts and Newbies. Available from: http://www9.org/w9cdrom/81/81.html [Accessed 26 October 2011].

Holt, J.D., Miller, D.J., 2009. An Evolution of Search. Bulletin of the American Society for Information Science & Technology, 6 (1), 11-15. Available from: http://www.ebscohost.com [Accessed 24 October 2011].

Jansen, B.J., Rieh, S.Y., 2010. The Seventeen Theoretical Constructs of Information Searching and Information Retrieval. Journal of the American Society for Information Science & Technology, 61 (8), 1517-1534. Available from: http://www.ebscohost.com [Accessed 24 October 2011].

Morville, P., Rosenfeld, L., 2006. Information Architecture for the World Wide Web: Designing Large-Scale Web Sites. 3rd ed. Cambridge: O’Reilly.

Navarro-Prieto, R., Scaife, M., & Rogers, Y., 1999. Cognitive Strategies in Web Searching. In: Proceedings of the 5th Conference on Human Factors & the Web, 1999. Available from: http://zing.ncsl.nist.gov/hfweb/proceedings/navarro-prieto/index.html. [Accessed 26 October 2011].

Robertson, S.E., Jones, K.S., 1997. Simple, Proven Approaches to Text Retrieval, University of Cambridge Technical Note, TR356. Available from: http://moodle.city.ac.uk [Accessed 25 October 2011].

Yuan, X., Belkin, N. J., 2010. Investigating Information Retrieval Support Techniques for Different Information-Seeking Strategies. Journal of the American Society for Information Science & Technology, 61 (8), 1543-1563. Available from: http://www.ebscohost.com [Accessed 24 October 2011].

Tuesday, October 25, 2011

Recap of Week 3 & 4 Classes

I was a slacker and didn't do a post for week 3 so I'm gonna just combine week 3 & 4.

Digital Information Technologies and Architectures (INM 348)
Week 3 involved looking at databases, which were created as a way to store data centrally and make it easier to access. Our exercise for this week was to construct 10 database queries. I found this exercise to be quite interesting. It was a challenge to learn how exactly do to this but once I got the hang of it, it was rather fun. You definitely get a sense of accomplishment from learning to apply what you learn to specific cases to get the right results. The standard format for queries is as follows:
SELECT columns
FROM tables
WHERE something is true;
So for example, the answer I got for problem one (to list the publisher, company and city of publishers in New York) was:
select pubid, name, company_name, city
from publishers
where city = "New York";
Week 4 was about Information Retrieval, which is basically what you do when you search for information via Google or other search engines. Precision of retrieval measures the proportion of retrieved documents which are relevant. You get this figure by dividing:
Relevant documents retrieved
                                                      
Total documents retrieved
Recall has an inverse relationship with precision and is measured as follows:
Relevant documents retrieved
                                                                              
Total # of relevant documents in the database

Library and Information Science Foundation (INM 301)
The topic for Week 3 was The History of Library and Information Science. Self-explanatory. We went from tasks of early librarians (organizing, adding titles, listing parts of documents, listing documents on shelf) to the development of a classification system of the world's knowledge in the form of a draft encyclopedia by Francis Bacon (1620) to Martin Schrettinger first coining the phrase "library science" in 1808, etc. The advent and role of special libraries was mentioned, and the steps of the information chain were listed: authorship/creation, dissemination/publication, organization, indexing and retrieval, and finally use.
Week 4 was about Information, Documents, and Collections. The three paradigms discussed were: system paradigm, cognitive paradigm, and socio-cognitive paradigm. We looked at Shannon and Weaver's Mathematical Theory of Communication, which calculates the amount of info that can be transmitted over a channel. Karl Popper (he keeps popping up in our discussions...get it? Hahaha. Yeah, lame, I know) suggested a theory of "Three Worlds," the first of which is the physical world, the second is the mental world of each individual, and the third is communicable knowledge. We also talked about the four levels of documents: 1) works (e.g. Shakespeare's Hamlet); 2) expressions (the English text of Hamlet); 3) manifestations (specific edition of the English text of Hamlet); and 4) items (this copy in my hand).

Spanish
We went over adjectives for things and for people (para cosas y para personas). Also, we covered gerunds (in English, the -ing verbs). So "talking" = hablando. This is used for any person the same; there's no changing the verb as there would be in present tense (hablo, hablas, habla, etc). Yolanda asked us: ¿Comó vienes a la uni? My answer: Vengo a pie/ Vengo andando. To which she replied: ¡Qué suerte! ("How do you come to university?", "I walk", "How lucky!").


Information Management and Policy (INM 341)
This was our two weeks of Information/Copyright Law. So...laws. What did I get out of it? England doesn't have a constitution. America wins. I think their law system is even more confusing than ours in America. How does that happen? I'll not hurt my brain again with all the breaches of confidence and actions in tort and contracts and whatnot. There's some laws. Don't break them. The end.

Research Evaluation and Communication (INM 356)
Week 3 was about Experimenting, Observing, & Evaluating. Experimenting is usually objective, positivist, and scientific, while observation is typically subjective and based on interpretation. Simple enough.
Week 4 was about Surveys, which can be split into interviews, focus groups, questionnaires, Delphi studies, and critical incidents. Interviews can be unstructured/naturalistic, semi-structured, or structured and can take place face-to-face, over the phone, or online/via chat. Questions can be closed or open. Focus groups involve usually 5-10 participants and allows group members to spark ideas off each other. A Delphi study involves a group interacting over time without ever actually coming together. It usually lasts 2 or 3 rounds to reach a consensus. Critical incidents are simply case studies. Sampling for studies can be complete, random, systematic/stratified, or convenience.

Woo! The end. No lectures next week so I can put off the recap of week 5 for a while, as I do. Now back to working on my DITA assignment! Slowly but surely, that one.

Monday, October 17, 2011

Adventure Update

I've been neglecting my blog. So sad. I've been busy and I tend to procrastinate like whoa. What have I been doing in my free time, you ask?



A few weeks ago, I went to a boat party along the Thames at night. There were some great views of the city at night, as you can see. A couple of weekends ago, I went out to Portobello Road Market and bought some antique trinkets to send back home. A lot of that stuff they try to sell is EXPENSIVE. Luckily I found a dude with random old shiny things more in my price range. After wandering around the market - and having a little ham sandwich and a cupcake for lunch - I passed by a book store and got two more books. I really need to stop buying so many books. I just...we don't have bookstores in Waycross and naturally I have to go in any I see...and of course I see something I just have to have in every one. I might need a support group for this.

After the bookstore, I walked to Hyde Park and spent a couple of hours wandering around. I'm not really sure what the statue thing in the picture here is all about but it was huge (and pretty) so I'm just gonna leave it here. I took a little reading break in the park because my feet were starting to hurt. Unfortunately, the tube system was partially down and I couldn't find a functional bus stop either so once I got from the Marble Arch station to Bank, I had to walk all the way home. That's like 30 more minutes of walking. Uphill. In the snow.

The weekend after that, June and I went to Brick Lane. It was nice. Lots of "vintage" clothes and such and lots of Indian food (which we didn't partake of since we were once again on a quest for pizza - which my phone helped us finally locate...we have a history of not being able to find pizza). Bought another book that day! It was Sir Arthur Conan Doyle. You can't pass that up.

This past Saturday, we had lunch at a French restaurant with June's mum and Barry. Then we had some amazing adventures traveling on the over-crowded Victoria and District lines, and also great fun trying to find work clothes for me (I gave up). At one point we had to cross the road but there was a fence/barrier and the light turned green for oncoming traffic. What did I do? The most logical thing, of course: ran screaming across the road and jumped over the barrier. June did the safe thing and waited for traffic to pass, then walked around said barrier. On our way back, we stopped in King's Cross station and found Platform 9 3/4. I look like a dork here, but that's okay. I own it. We rode Boris's bikes home. Oh my heartattacks. I obviously don't do well with a heavy bicycle in heavy traffic. Uphill. I survived though. Maybe I'll forgive all the cyclists for nearly killing me all those times. It's harder to be a cyclist than a pedestrian. Definitely an adventure, that one.

Later Saturday night, we went to Ministry of Sound for Basement Jaxx. It was fun. Not so fun getting there. The tube was out and so were half the bus stops. We had no problem getting back home though, thank God. Maybe you just need to do all your traveling at 4am. Good to know.

Sunday, October 9, 2011

Recap of Week 2 Classes

Digital Information Technologies and Architectures (INM 348)
We talked about the Internet and the World Wide Web. Honestly, I was a little foggy on the difference because they are used pretty widely as synonyms. The metaphor that was used helped me understand it perfectly though - if the Internet is the road, then the Web is the car. The Internet is made up of more than just the Web (things such as email). We talked about domain names and URLs, etc. Our lab exercise involved us working with HTML to create webpages. I did everything alright until it got to putting the pages on the web. Somehow I messed that up and I'll have to figure it out in lab tomorrow. We also tinkered with CSS, which I need to look into further because my understanding of it is much more limited than my understanding of HTML.

Library and Information Science Foundation (INM 301)
The topic of this week's lecture was Intellectual Tools. We examined how documents and knowledge were organized from ancient times to now. In Mesopotamia, texts were indexed by their first few words, as titles were not used. Some of the most notable ancient libraries had very large collections. The Palace Library at Ebla, Syria (2600-2300 BCE), for example, had two storerooms that housed 17,000 clay tablets. That's a lot of tablets. Assurbanipal's palace library at Nineveh didn't last long: 650-612 BCE. Here they used identification tags on texts that listed the location (jar, shelf, room). Skipping some things and jumping forward a bit now...the advent of the printing press allowed more copies of texts to be made and collected. Encyclopedias began to show up in the 18th Century, around the time of the Enlightenment. The Library of Congress was founded in 1800. Pannizzi's "91 rules" served as the cataloging code for the British Museum; this was paralleled by Jewett at the Smithsonian. It was really interesting to see all the different ways of classification and the evolution of library organization over the ages.

Spanish
I was not as intimidated this time. Even though I hadn't managed to study much, I still didn't end up looking completely brainless. Yay! We were asked to write and discuss what we did that day. The professor took my work up and projected it on the overhead. Luckily, I only made a couple of mistakes. I said "en la mañana" and "en el tardes" instead of saying "por la mañana" and "por la tarde" ("in the morning" and "in the afternoon"). Not so bad. So my final product looked a bit like this: "Por la mañana, voy a la oficina de correos. Envío tres paquetes a los Estados Unidos. Envío un paquete a mi madre y dos paquetes a mis amigos. Por la tarde, estudio." We then covered the difference between the verbs ser and estoy. Our last exercise was to ask some classmates basic questions (what's your name, where are you from, what are you studying, etc).

Information Management and Policy (INM 341)
This week, we covered information overload. Does anyone else in the Library Science course (or the others) feel like they're suffering from this? I am. I feel like there's about a billion related works I could be reading, that have been on the suggested reading lists. I just don't know which ones to pick and there's no way I can manage to read them all. I will need to work on which ones I choose and how I handle the seeming overload. I found a lot of this lecture interesting. For example, Reuters did a study in 1997 called "Dying for Info" in which it was shown that 2/3 of people in this study felt that information overload caused loss of job satisfaction, 2/3 felt it damaged their personal relationships, and 1/3 felt it damaged their health. Schwartz's "paradox of choice" tells us that 10 is the highest number of possible choices we can deal with and still make a rational choice. Perhaps this is why I'm terrible at making decisions. Not my fault! Too many choices! Some possible coping tactics for information overload are: information avoidance, "if it's important they'll tell me," and satisficing (just choosing based on knowing a little but not everything; in other words, settling). I also felt the "handle only once" and "fear the 'might be useful'" tactics would help in my life, as I tend to keep things that could maybe be important later (but not really) and then wonder why I have so much junk on my desk and in my life. There are also some possibly advantageous programs mentioned in a suggested article, one of which (called Leechblock) limits the time you spend on certain websites each day.

Research Evaluation and Communication (INM 356)
This week, we looked at methods of desk research. Desk research forms a part of all research and is the only method used in some studies. Types of desk research are: literature review, meta-analysis, conceptual analysis, historical analysis, content analysis, discourse analysis, and bibliometrics. Meta-analysis gives an overall picture by examining results of several studies. Conceptual analysis aims to clarify terms and concepts. Discourse analysis focuses on how language is used. Bibliometrics examines patterns in recorded information, such as size and growth of literatures and changes in communication patterns. The others are self-explanatory, I think.

Whew! I've got some textbooks already, as I've mentioned. I have Knowledge Management: An Integrated Approach (2nd Edition) by Ashok Jashapara, as was suggested for our Info Management and Policy module. I'm on chapter 3 now. I like this one. It's easy to follow and covers a lot of the stuff we talk about in lectures. I'm about to start chapter 6 of Information Architecture for the World Wide Web (3rd Edition) by Moraville and Rosenfeld. This one was suggested for DITA. It gets off to a very slow start but I think it might finally be picking up the pace a bit now. At least, I hope it gets progressively better. Week 3 starts tomorrow!!

Sunday, October 2, 2011

Recap of Week 1 of Classes

I'm a little late on this one but I thought I would try throwing all my thoughts on all my classes into one post. It might be a bit chaotic and I'm not sure if my terrible excuse for a memory can handle going all the way back to Monday. I'll give it a shot though.

1) Digital Information Technologies and Architectures (DITA) (INM 348)
First class. 9am on Monday. Excellent. We talked about computers and what they do, binary, and file types. Not that difficult. I was a little rusty on the binary thing but it was easy enough to grasp. The whole ASCII part, though, with binary used to represent letters and whatnot would certainly be a little tougher than just 00000001 = 1 and 00000010 = 2, etc. I haven't really looked into it much. In lab we did an exercise in which we created a document and made formatting changes to it, then opened it in notepad to see what the formatting did to the document. Whoa. Not English. Lots of symbols and undecipherable (by me) gibberish. I have to say, however, that it is amazing how very complicated even the smallest changes can be underneath the surface. We then made our blogs. Simple enough.

2) Library and Information Science Foundation (INM 301)
The lecture was on the History of Information. The five information ages are as follows:
• the age of spoken language
• the age of writing and recorded information
• the age of printing
• the age of mass communication
• the age of networked digital computing
We looked at what can be considered language in terms of written/recorded symbols (is there a way to decipher it? Meaning, is there more than one sample of it which can be used to compare and translate the symbols somehow?). Several different examples of ancient texts and writing were shown in the slides. Very interesting stuff. 

3) Spanish: Lower Intermediate (Not part of my degree or anything but still recapping it)
I haven't taken a Spanish class in about 7 years. Sure, I learned a bit at work these past few months but most of what Papi and Alberto taught us were cuss words and therefore not all that useful in an academic setting. I followed the lesson as best I could although I'm not at all sure what the professor was saying sometimes and she talked rather fast too. I got my Instant Immersion: Spanish CD-ROMs in my package from home this weekend so I'll be using those to play catch-up and have study times so as not to look dumb in class. I also downloaded a couple of Spanish dictionary and conjugation apps on my phone. Huzzah!

4) Information Management Policy (INM 341)
We looked at different models of information management (TS Eliot/pyramid model vs LS Lowry/cognitive model and so on). We also discussed information as a resource and how it is different from other resources in that it is difficult to ascribe a monetary value to information. Information is also a unique resource in that it is sharable: it can simultaneously be given and retained.

5) Research Evaluation and Communication (INM 356)
This class will serve as a basis/introduction for our dissertation research. The lecture discussed the various ways of collecting and analyzing data, qualitative vs. qualitative data, and how important it is to set up a plan for how the information collected will be analyzed before even collecting said information. We talked about the pros and cons of the three different types of research: surveys; experimenting,
 evaluating
 and observing;
and 
desk
 research.

So that's the first week in a nutshell. I've received a few of my books in the mail already and am starting to read them. Well, I've read one chapter in one book so far. I'd say that's still progress. Week 2 starts tomorrow!