Jason Fleming
LIS 6726
Indexing/Abstracting
Drew Smith
July, 19 2006
Introduction
Web publishing has become commonplace as any user with access to the Internet is able to create content to share with others. This has only served to further fuel the information boom. The current method of using personal bookmarks and search engine queries does not provide the necessary for the active information seeker. New tools are needed to keep track of information as it is discovered to prevent information loss. There additionally a need for tools that facilitate reliable information discovery. Bookmarks were introduced in the early 1990’s and quickly caught on as an effective way to represent hierarchical structures of found links. These were not without their limitations though as users found it difficult to share their bookmarks, and the structures became unwieldy as the number of URLs bookmarked grew. Super bookmarkers who published their lists online provided the community with valuable information. The links in these pseudo web pages suffered from an inherent instability associated with typical URL breakdown. Another feature that is significantly missing in a bookmark is a time-stamp that would allow the user to gauge during what period of their life they found the item interesting (Abrams & Baecker & Chignell, 1998). This feature is present in several tagging systems giving the user a quick glance at their own personal timeline of interest.
Tagging is a term that describes the act of assigning of keywords to objects on the Internet to aid in future retrieval efforts. Similar criteria are used when creating bookmarks as goes into the process of selecting information to tag. The user must determine if the object is generally useful, of good quality, and interesting (Abrams & Baecker & Chignell, 1998.) Users assign single words that theoretically represent aspects of the content that are of interest. Users can later retrieve items by performing a quick search of their tag library (Hammond & Hannay & Lund & Scott, 2005). Interest in this functionality has increased, and a number of sites currently now offer some attributes of tagging. Folksonomies represent the action taken by the web community to gather tags together in aggregate to increase the opportunity for information discovery (Brooks & Montanez, 2006). The term itself is a colloquialism that is intended to convey the social aspect of indexing digital objects.
Folksonmies in action are easily viewed on the web sites Flickr and Del.icio.us where users are able to tag items such as photos and web sites respectively. Users are able to search for all the items they have tagged with a desired terms or terms. The added functionality of aggregation comes in the guise of a tag cloud, or a list of the most popular tags. The aggregate of Flickr tags can be viewed visually by navigating to the website http://www.flickr.com/photos/tags/. Popular tags are represented here and the user can easily see at a glance which tags are most popular by noticing that some terms are larger than others (Chudnov & Barnett & Prasad & Wilcox, 2005). The more popular a tag is, the larger the font it will appear in. This list changes continually as the aggregate of terms are re-indexed by the website. Del.icio.us offers the other type of folksonomic instance which is a list of popular tags. These tags are continually listed as users create tags, and the site sorts them by popularity. This listing can be found on the front page of del.icio.us (http://del.icio.us/ http://del.icio.us/.) It is interesting to note that the creators of the site have decided to limit the number of results to three entries per hour for a time span of five hours. This allows users to see at a quick glance what is “hot” right now. It is also possible to subscribe to the RSS feed of popular tags or those tags generated by a single user (Godwin-Jones, 2006). This feed of continual tag popularity represents the topical current of the user aggregate. An in depth analysis of the structure produced by a tagging system contrasted with the hierarchical structure typically provided by bookmarks will serve to illustrate the advantages of folksonomies. Tagging systems have their share of detractors, who are quick to point out the various deficiencies inherent in the system. These points should be addressed individually with corrections or a rebuttal illustrating the benefits of what is seen as a weakness. In either case an examination of methods that can be employed to “fix” folksonomies will generate a positive discussion within the tagging community.
Hierarchy vs. Post Coordinate Index
Bookmarking represents a hierarchical structuring of web content as created by the user. This organizational method stands in contrast to that of tagging systems which are flat by design. Tags are set up in a post coordinate index, where each term carries an equal weight to any other term (Lancaster, 2003). No relationships exist between terms so the word cat is not associated with the word kitten. Many detractors point to this perceived weakness of the tagging structure as a flaw that inhibits the organization of information in a meaningful fashion. Bookmarked structures have a related problem in that folders might be created and structured in a biased manner. Relationships in indexes can have similar flaws as the relationships between terms and content may be chosen subjectively depending on their level of familiarity with the content (Fidel, 1994). Hierarchical systems also suffer from a fixed state condition that limits users to on choice of locations for an item to be within the structure. Tagging systems do not have this problem because all objects are accessible on a flat playing field (Golder & Huberman, 2005). A tagging system is similar to an indexing structure because it is an art form that allows for a number of different ways an item (page) can be referenced by a tag (entry) Mulvany, 2005). Tags, unlike bookmarks, represent a large number of access points that can produce the correct answer for the user (Tonkin, 2006).
Anti-Folksonomy
Tagging systems are not without their critics who are quick to point out the negative aspects of the system. Sloppy tagging is the name given to the creation of tags that weaken the system by their nature (Guy & Tonkin, 2006). Tags are widely used for many reasons, but one of those reasons is that they can be applied quickly and often without too much cognitive effort. This may be the attraction of organizing information using this method, but it often leads to user error. Tags that are spelled incorrectly can result in a loss of information as users lose track of items they have tagged with the wrong term. This action splits the entry point if the user tags different items in different ways. This might also occur if the user pluralizes some tags inconsistently. This issue is still contentious in the indexing world as well as evidenced by recent emails in the Index-L listserv which made it clear that not everyone is in agreement concerning how pluralized entries should be entered (Brown, 2006). It is most important that the user is consistent within their tagging library, but it also becomes necessary for the sake of folksonomies to consider how the same tags are constructed in the entire database of users. One answer might be to use a concept that has been championed in the indexing world known as double-posting. This method is used when an indexer wants to insure that all users are able to find the information they desire regardless of how it has been entered. So an indexer would tag the singular and plural in this instance or other forms the word might take (i.e. folksonomy, folksonomies.)
Another example of sloppy tagging is when homonyms (e.g. drive; a car, or the computer storage device), are used. This can create confusion for searchers who want information on one subject but are directed to another completely unrelated one. One solution to this problem might be to educate the tag creator and encourage syntactical creation that encourages logical oversight. For example the word drive would not stand alone; it could be joined to another word such as computer to give us computer-drive. This represents a subheading entry that should be familiar to users of back of the book indexes (Mulvany, 2005). The problem with this is that users would have to come to some consensus concerning how these subheadings are constructed or else you will get users using different methods to construct the same thing but using different symbology (-,/,+.) This might be where the site creator can step in and limit the types of symbols that can be used. Tag Bundles are new folksonomic tools that have been developed to create tags for a group of tags. This allows users to create a hierarchy of tags that may allow for new ideas (Hammond & Hannay & Lund & Scott, 2005).
Tags are primarily personal in nature and as such tag creators will try to make the structure work for them. This practice can diffuse the folksonomic structure due to imprecise words that are used purely for the creator. Examples of these kinds of tags include but are not limited to todo, toread, and towatch (Brooks & Montanez, 2006). These terms are ambiguous and mean nothing to other users, but they can be very beneficial to their creator who is attempting to keep track of object they are actively interested in (Guy, 2006). The answer to this problem may be to create a separate tag line for the item being indexed. This practice is familiar in the indexing world as separate indexes are created for a single book to represent different aspects that cannot come together in a single index (Mulvany, 2005).
Benefits of Folksonomies
Folksonomies bring together many points of information that have been collected by diverse groups of people. They are unknowingly working in concert to make something that is bigger than each individual. Every participant can reap the reward from this effort as well as folksonomies encourage information discovery (Surowiecki, 2004). Of course it is important to point out that this is not a closed system and it is possible for users to confuse the set of data by performing what is known as tag fraud. If false data is entered to lure unsuspecting users to sites that are designed to infect computers or to entice them into spam like schemes then the system does not benefit. Reporting mechanisms need to be in place to insure that this behavior does not continue unchecked. This is a problem that popular online systems must face, as they grow so do the number of people trying to take advantage. It is possible to also create a system that requires user authentication when an account is first created. This is traditionally done by emailing a link to user to access a verification page. It is easy to talk about the benefits of folksonomies, but it is also wise to consider the dangers as well.
Fixing Folksonomies
The main problem with folksonomies is coincidentally their greatest feature; namely the simplicity of tag creation and information gathering. When offering potential fixes to the folksonomic problem article authors are usually quick to mention that changes should be minimal and made in a gradual manner (Brooks & Montanez, 2006).The Web is immense, and professionals simply do not have the time to index everything on the Internet (Golder & Huberman). Tagging is an open source type of solution that can be utilized by anyone on the internet. Tags can be created during resource discovery by using bookmarklets, or firefox extensions that allow users to make tags without leaving the site they are on (Hammond & Hannay & Lund & Scott, 2005). For folksonomies to be most beneficial they need to overcome some of the problems that have been pointed out. How to instruct the user to make better tags is one of the issues that must be addressed.
Several researchers at the University of North Carolina undertook a program to educate authors in the construction of Metadata for their own web content. This group of users were given a half hour course in Dublin Core Metadata creation, and then tested by letting them tag their own content. The result of this study showed that users can be taught to create good tags that are similar enough to ones created by professional. There is no substitute for years of training, but some instruction given to the content author was the next best thing (Greenberg & Pattuelli & Parsia & Robertson, 2001.) Unfortunately every tag creator cannot be sat down in a classroom where they are given a lecture about the finer points of polysemetic distinction (homophones with similar meanings. The answer might lie in a voluntary tutorial made available to the user in an unassuming fashion. If ideas that can easily encourage the productive evolution of tag semantics are propagated then that might solve today’s set of issues. Proponents of folksonomies will be quick to point out that this is a natural phase for the language to go through before a grammatical solution is presented (Guy & Tonkin, 2006). The quick and easy way that tags can currently be added to objects as index points on the Web certainly points to its usefulness as a tool. Future improvements will be made as the community decides changes are needed. Del.icio.us has already started using a list of suggestions for tag creators to choose from when constructing an entry (Hammond & Hannay & Lund & Scott, 2005). These suggestions are based upon entries that have been created by other users. Unalog (http://unalog.com/), another folksonomic site, has a hint page for searching, but they might consider a hints page for tag creation.
Another suggestion has been offered to bring to light the benefits of both a hierarchical system and the flat structure of tagging. Researchers in the Computer Science Department at the University of San Francisco have begun researching a way to examine a cluster of tags that point to a reference, and combine that information with automated indexing of the site (Brooks & Montanez, 2006). This process can be performed by a series of computer “agents” that automatically find and auto-tag a web page (Godwin-Jones, 2006). The inferences made between the two sets can be used to create a pseudo-hierarchical structure that can provide relationships between terms. The obvious benefit of this proposal is that users would retain the benefit derived from the open structure provided by tags and they would also benefit from the relations generated by this process.
Conclusion
Folksonmies are a hot topic right now in the realm of resource sharing, and discovery on the Internet. Whether they will last the test of time, or vanish back into the dust from whence they came, has yet to play out (Tonkin, 2006). The advantages they have are many; simple to create, easy to access, and easy to edit. Disadvantages are controversial and may have been overstated in an effort to help the community grow towards a better developed system. This process is analogous to hacking into computer systems to illustrate to site owners that their resources are not secure. With concerted community effort the massive voluntary indexing of the internet may prove to be the future of the internet’s organizational structure.
References
Abrams, D. & Baecker, R. & Chignell, M. (1998). Information Archiving with
Bookmarks: Personal Web Space Construction and Organization. In Proceedings
of the SIGCHI conference on Human factors in computing systems. Retrieved on
July 14, 2006 from the ACM Digital Library
Brooks, C. H. & Montanez, N. (2006). Improved annotation of the blogosphere via
autotagging and hierarchical clustering. In Proceedings of the 15th International
Conference on World Wide Web (Edinburgh, Scotland, May 23 – 26, 2006).
WWW ‘06 (pp. 625-632). New York, NY: ACM Press.
Browne, G. (2006, May 23). RE: plurals using (s). Message posted to index-l listserv,
archived at http://lists.unc.edu/read/messages?id=3400320#3400320
Chudnov, D. & Barnett, J. & Prasad, R, & Wilcox, M. (2005). Experiments in academic
social book marking with Unalog. Library Hi Tech, 23 (4), 469-480. Retrieved on
June 28, 2006 from the database: Emerald Insight.
Fidel, R. (1994). User-Centered Indexing. Journal of the American Society for
Information Science, 45 (8), 572-576. Retrieved on July 11 2006 from the
database: Wiley Interscience Journals.
Golder, S. & Huberman, B. (2005). Usage Patterns of Collaborative Tagging
Systems. Journal of Information Science, 32(2). 198-208. Retrieved on July 11,
2006 from http://www.hpl.hp.com/research/idl/papers/tags
Godwin-Jones, R. (2006). Tag Clouds in the Blogosphere: Electronic Literacy and Social
Networking. Language Learning & Technology, 10 (2), 8-15.
Greenberg, J. & Pattuelli, M. & Parsia, B. & Robertson, D. (2002). Author-Generated
Dublin Core Metadata for Web Resources: A baseline Study in Organization.
Journal of Digital Information, 3 (2) Article No. 78, 2001-11-06. Retrieved July
15, 2006, from http://jodi.ecs.soton.ac.uk/Articles/v02/i02/Greenberg
Guy, M. & Tonkin, E. (2006). Folksonomies: Tidying up Tags? D-LibMagazine, 12 (1).
Retrieved on June 20, 2006 from
http://www.dlib.org/dlib/january06/guy/01guy.html
Hammond, T. & Hannay, T. & Lund, B. & Scott, J. Social Bookmarking Tools (I): A
General Review. D-Lib Magazine, 11 (4). Retrieved on July 14, 2006 from
http://www.dlib.org/dlib/april05/hammond/04hammond.html
Lancaster (2004). Indexing and Abstracting in Theory and Practice. Champaign, IL:
University of Illinois.
Mulvany, N. (2005). Indexing Books. Chicago: The University of Chicago Press.
Surowiecki, J. (2004). The wisdom of crowds: why the many are smarter than the
few and how collective wisdom shapes business, economies, societies, and
nations. New York: Doubleday.
Tonkin, E. (2006). Folksonomies: The Fall and Rise of Plain-text Tagging. Ariadne,
47. Retrieved on June 28, 2006 from http://www.ariadne.ac.uk/issue47/tonkin/intro.html