Library Technology

Developing technologies to support libraries.

Internet Librarian Monday Session 2 D102 Karen Coombs, Jason Clark

Well I have had some time to sleep and reorient myself with what I do at work. Here are the notes I took during the session titled “Innovative Uses of Web 2.0 Technologies”

Jason Clark
Karen Coombs
Web 2.0 tech and innovative uses
“Introducing web 2.0 into library websites”

Architectural things are done on the backend to make the web site friendlier

There is a movement towards a more active and dynamic web

Examples

social software
weblogs
folksonomies
wikis
podcasts
web APIs
AJAX

concepts
-radical decentralization: developed content management system to let people make their own changes. lets them make changes to any pages, has to be approved by that particular webpage owner.

Librarie’s should be encouraged to incorporate wikis and blogs into the library site so that the users are able to be more creative

-small pieces loosely joined: use whatever software or piece of programming that meets the needs at the time. For each problem build different modules to meet the need. This makes content reusable anywhere, and it lets faculty members grab the information they need. Aspects of the content management system can be replaced as needed. This makes the library site more flexible

-perpetual data: Deploy systems early, and make constant improvements. Bring the system up as soon as possible then get feedback and make changes as needed. A small group of library staff should do the alpha testing and then the rest of the library does beta testing. Enhancement requests an be posted on a wall and the development team can be constantly working to make improvements–mindset: won’t always work not set in stone open for improvement. Constant Change. Painful for librarian to change. Helping them change with perpetual beta idea. Gives them the sense that something new is always coming

-remixable content: intention to build API that allows incorporation into other systems. API lets others pull in all the content they want. use AJAX to add database link to any wiki blog etc.. Once functionality and APIs built everyone can use it. Doesn’t ask everyone to be a brilliant coder to work with data.

-user as contributor: users can contribute to what is going on with library website. Idea to let users have access to update and change page i.e. UTHINK. User tagging and review of content in catalog**

-rich user experience: multimedia -screencasting. how to do things etc.. Personalization and customization. Balance personalization with privacy issues. Spaces for collaboration and interaction.

University of Houston library

Further resources
John Blyberg, Paul Miller

***

Jason Clark

Social tagging and folksonomies in practice
ROJO
i.e. del.icio.us, Amazon,
ability for user to contribute and make resource their own and share knowledge with larger group.
Library setting:
-additional access points in library catalog
-user vocabulary , organize info
-communities of practice centered around book or article.
-organize group of pages for library subject guide.

PennTags. del.icio.us clone for library universe at UPENN
users use University IDs allow University community to come in and tag library resources.

(IDEA: make tagging utility for libraries with individual institutional access)

Montana: Beta electronic theses and dissertation component to library system-giving users ability to tag these resources.

-open system that doesn’t need for users to login.

social tagging :why does it work.
*adaptable
*maps and displays simple relationships
*not complex

potential problems:
-lack of precision:synonym and term drift.
–vulnerable to”gaming” of system (if group of users coordinate they can decide what terms belong to particular resource),
-some people are wrong and don’t understand what they are tagging.

Goal:
*mine data from users and build a better system for them.

zoom cloud / tag cloud

blog on tagging

tagsonomy.com

coders: PHP MySQL etc

open source options
–freetag

Blogs Wednesday Internet Librarian 2006

Walter nelson from RAND
Syndication and website content.

using blog related tools in other ways
customers don’t understand rss, so hard to feed it to community
people get webpages, so if you can put it on web page then your users can use it

moveable type-installation the hard part.
installed on server
canned formats can be used.
or you can make it your own. make it to look like any other website.

you can make moveable type into a CMS
Features:
-easy to use
-control of authors
-generates static web pages
-automatically updates rss feed.

!Think outside the blog!
-blogs are just a tech with useful features
-moveable type can be used for other purposes, you don’t have to use it just for blogging

purposes.

Feed2JS

2nd piece of puzzle
freeware from maricopa community college. use it to generate javascript and it displays your

rss as a bulleted list

allows you to propagate info broadly. you can use their server to parse your rss feed, but

not recommended.

install feed2js on your own server.

Announcements section on the RAND page is an RSS feed
-if you click on headline you will go directly to blog entry

Categories=taxonomical info

RSS feed -additional uses

add external newsfeeds
create static “link list”

category feeds
parse rss feeds into category
one blog creates multiple feeds

-set up branch libraries as categories

***
Using blogs for internal communications
Coombs

existing tech infrastructure for internal communication not well used.
it is easy for staff to create and maintain feedback
communication about projects needed.

diff kinds of blogs
-committee blogs, used to make announcements and post minutes and other committee documents, gather feedback on what the committee is working on.
-blogs for service points (access services, reference) maintained by staff who work at certain desk, announcements, active desktop on whatever computer someone goes to at the desk. great because it keeps the newest stuff on top
-working groups, groups working as a team: similar to committee blogs because they help users communicate, allows users to send important materials found that should be read.

Unresolved issues:
-blogs are private so there aren’t any rss feeds so hard for users to subscribe and keep up to date,
-don’t want to make some of them public because they don’t want some announcements to be seen generally.
-integration with existing authentication and authorization systems. yet another password to remember
- keeping up with blog permissions,: people leave accounts deleted, standard account management issue.
dynamic templating feature from moveable type:files aren’t created, relies on an older version of php, and if you upgrade php it breaks the system. -so looking at moving away from moveable type

***
Aaron Schmidt
*no one cares that you have a blog*
using blogs as a tool to put them on your website
to combat criticism–it is only a website.
it is about connecting and what they can do with website and how they can contribute to it.

lamson library-plymouth state – bisson using moveable type as opac
interaction happening at the opac.

Flickr: use uploading tool=piece of software that lets you drag and drop photos and bundle tags

flickr can be associated with web blog and you can send the photo to your blog
>flickr badge

new materials on flickr
westmount library, instead of these are the new titles they use a graphical representation and post them on flickr with notes, the notes describe the book and link it into the library opac.

meebo me–im amazing tool for weblog, allows communication in synchronous or asynchronous chat, widget.

blog elsewhere. — follow through get content into other pieces of blogosphere to drive information to you.

contribute to blogs in your community, flow will come back to you.

nationwide group project to contribute to social software web sites.

follow through-have a content plan, if you just let it sit there it will be static and wont do you any good.

?’s

what other blog platforms are you looking at-looking at wordpress new.

integrates with corporate directory signons – so that would help with authentication.

moveable type is working on the kind of functionality too.

Track D301: Wikis in libraries: Internet Librarian 2006 Wed.

1st speaker:
Introduction to wikis
Nicole Engard

*share secretary duties from a meeting, everyone who was there can add comments.

wikipedia not to be used as primary research tool
can be used for information discovery

why use wiki?

>easy to learn
>share knowledge
>cross borders
>revert back to older editions=security
>you track who has made changes

Mediawiki is free but has to be installed on your own server.

huge list of wikis
c2.com/cgi/wiki (some more to the address here to look up)

**
re-building intranet and wanted to have more collaboration.
2006 introduced intranet 2006
-more task oriented now, previously dept oriented.

*goal =radical decentralization=empower users

Nicole wrote the code herself because when she started it she trained librarians on wikipedia to get them used to it, and created a page where everyone was encourage to go in and have fun. no on edited it. so she looked to see whay they weren’t editing it. found out that syntx was the problem. wanted to develop new wiki using previosuly deployed application. –wysiwyg

-a lot of wikis dont have structure and their organization need one. (hierarchy)

icon (graphical) keys make it easy for users to edit pages

personal quick links on sidebar based on login

there is a button that allows users to submit page for checking to web team

editing tool looks like ms word for example –used wsywig pro (not free) powerful and works well with multiple editors on page.

*breadcrumbs at the top of pages–

***
Darren Chase
an agrreable wiki

-problem:working together collaboratively on projects can be difficult

*Bird problem story-

-whatever solution is they wanted to have control especially for the sake of Systems.

options
-static html
-blogs
-cms
-wiki

staff may have to meet a learning curve for the wiki

had to choose one, so considered different options

-pbwiki
-mediawiki
-twiki

-wikimatrix lets you compare all the different options

They chose twiki because it is easy to edit and it has good features, access control and versioning. Another good feature is the ability to make web pages.

building twiki:
perl module used instead of using sql database.
easy to install for systems department
-plugins

staff training and buy in important. or else the project will fail

Scholarship in Chaos

Flying high on the web or in free fall?
Rich Wiggins MSU

Microsoft Elsevier, and Google representatives on the panel

The future?

E:dynamic market, phenomenal the info available. working on federation and analysis. peer review question, publishers becoming more important? in academic market this poses some challenges:

  1. versioning
  2. quality (peer review)
  3. archiving

scirus
giving structure to the data.

M: wants to broaden question. how it is relevant to academic librarian. has broad interest and appeal. user appeals are not fitting with typical academic researcher. queries span into diff regions. Excited about content from publisher or open access repositories. making it available using Academic Live search is what they are trying to do.

Question? what kind of searches are being done?
(from audience–science searches)

M:seeing searches from a wide audience.

G: (on the question of academic searching) -sharing a story- as an undergraduate on a small campus in india trying to find faster ways to multiply small numbers on chips-went to library looked up articles followed citation sent work in and reviewers told him it was nice but where has he been for last five year (laugh)
Goal of what they are trying to do is make it possible for anyone anywhere to find research wherever it is produced throughout the world.
-Q: from Donna (LC) in the audience- in terms of scholarly publishing finds that audience is going to OA or blogs what do you have to say about scholarly publishing on blogging what should she do?-how do we capture scholarly info.
G: from goggle’s viewpoint -1st trying to make sure that formal information first (from google scholar POV)
Donna:trying to capture serious scholarly blogs that actually have scholarly info- how is google keeping up with this new trend?
G: what is the problem.

from the audience: blogs are ephemeral but they might have good academic material.
E: so what you are saying is that there are new modes of publication, I would recommend that you go to scirus because they don’t just look at traditional publication types. important to make a distinction between blogging and peer reviewed material and scirus is able to do this.

M: come up with way to identify blog as scholarly so it can be cataloged and made discoverable. =standards

Audience: how to preserve blog content

Barbara: this happens with scholarly journals as well. failure of archiving

E:challenges of new era -archiving and versioning already mentioned (see above) that is why we have peer review process

(scirus seems like a unique thing, how do you address what is on the web?)

versioning and archiving is responsibility of publisher.

Barbara: google scholar approach, struck by fact that when you cluster material you weren’t limited to types of products. do you follow into blog country to pursue the authors thoughts.

G:archiving significant problem
Barbara:frequently google’s cache returns things that have been gone
host:wife’s cat has thyroid problem and radiating the thyroid kills it and another gland takes over. google search for radioactive cat how do we find important blog articles

g: important articles will rise to top
host:what about controversy
g:if we can analyze it then the one with more people believing in it will rise to the top. we have to depend on what the people in the field talk about.

m: uses similar techniques for relevancy ranking.

host:serendipity. throwing random things into hitlist?
M:no.

e: agrees, relevancy is difficult to compute. trying to show results from different source, scirus has keyword terms to give clickable related terms.

host: theory of comprehensive search, women part of clinical setting, show to have toxic effects that were not indexed online. dr. only looked online and was not able to fond this info so lady was killed.

e: crucial question. difference between search engines and A&I databases. A&I database goes back and enters information. In terms of comprehensiveness search engine cant really do anything about this.

host: if your relative had complicated condition what databases would you search.

e: abstracting & indexing database material determine by editorial design. Would not google it.
m: focused on historical material in printed form, only a matter of time before information made available. Start with brother who is a surgeon.
G: I would search everything, every possible thing no matter where. Would not stop at google.
Barbara: contact medical librarian.

from audience: controlled vocabulary? And date searching?

g: search logs show no vocab based searches so doesn’t think it is very important, there are issues because people don’t know what problems are.

Donna: the law , when authoritativeness –when most important piece of law could mean someone going to jail or not. Working on something bringing all the different pieces of law from around the world into one place and using an authorotative language to go behind it. They help you get over asking the right question; what you don’t know doesn’t hurt you. Software and hardware.

if you type in a particular term then a sidebar could pop up that have the related terms

Donna:knowledge in thesaurus beyond what any one person may know.

e: for example if you type in heart attack you get cardiac failure as well. Thesaurus important. In scirus there is a controlled vocabulary.
m:good feature idea for navigating through information. Looking at different types of taxonomies. writing software to draw vocab from text. investing in this idea.
g:problem with concept refer across different languages and fields is complex and not easily resolved by using ontologies. you can go back and look at common use and scenarios. when a new concept is used it is described in previous terms and then it is named and referred by new term from then on. simplest thing to do is expand queries. keeping track in drifts–influence across different fields is hard to achieve but it does not solve by using ontologies to say this is solving the problem

host: if I search dna testing how are you mapping it to say “did you mean” kind of improvements.

g: collaborative fltering: no
m: no,
e: no, but it is an area where search engines might be moving to make use of the community that is using it.

host: are you capturing click throughs?

e: we don’t have log analysis on which pages are being clicked on
m: yes
g: of course

Barbara:Q? the problem with taxonomy is that they don’t solve the problem by going back in time, but what about clustering, relevance ranking in a certain field of expertise, if I want to use equity as a legal term that is one grouping but if I want to use it as a company that is another search, I would like to be able to narrow search. Northern light and ask.com do this.

host: so disambiguation

e: diff approach all documents are classified by scheme you can assure that you are only search within a specific scheme. based on training set took articles from classification they knew and used terms to reclassify terms in scirus
m:no we don’t but it is an interesting term. difficult to solve
g: cluster instances of same work, tricky to figure out right granularity for it to be good for the user, attractive and easy to use for specific or artificial examples, to do it across the board and so it is easy for the user would be quite difficult

host: date of document

e: in scirus you can,
m:you can sort by date,
host:problem is they web site doesn’t tell you when items were last updated.
g: yes, they have to be able to see if pdf and doc are same. Continuum – question -is it useful as a result, if not it shoud be removed. how to figure out date.. in some cases you can, depends on data, on academic scholarly cases it is possible because it says so. harvest full text and analyze document itself, do not rely on info provided to them. possible in some cases to do a reasonable job.

audience: do none of you search informal scholarly literature. what sources.
g: can you define informal literature – audience:blogs
m:no blogs
e: we don’t focus on blogs, but if we determine it is scientific then it is absorbed.

host: who decides if something is scholarly?

e: analyze terms in documents and based on that (vocab) they are marked as scientific, use relevancy score.. it is robotic

host:will satirical scientific journal be included
e:great trust in algorithm
g: at this point they don’t include blogs because they don’t feel confident that they can decide what is valuable.

audience:pre print server issues?
g: yes they
m: yes actively try to find OA
e:yes if it is scientific

host:self archiving issue is something to ponder

audience: what id A&I opened their info to you what would you do with it? (structured xml)
m:we would love to have high quality metadata.
e:work with constructing, medline database in scirus and use data in there
g:would take it in a heartbeat.

audience:that is the problem, what search engines are not getting is the problem. disconnect between electronic publishing systems and getting it into the search engines.

e:we work closely with publisher and we don’t agree with that. open archiving initiative is making good crossroads. we work with 10 publishers and just signed agreement with crossref-metadata and then full text, protocol developed by crossref. Estimation that there are 4000 stm publishers.
g: if a&I were able to provide data in xml then they would take it in a heartbeat. dealing with the

barbara:bottom line. services handling info needs when librarians are not there. what can we do to help you do a better job, for example open access good for everyone should we contribute to lobby for this?

G:the bills are paid by the librarians, if you want a comprehensive search , content that is licensed must be indexed, it will improve search dramatically.
m: provision of content and make sure it indexed, putting structure on content, classification of blogs for example and publishing of metadata etc..

host: interesting to see who defines standards

e:working together with librarians is key. they are the experts on what users need, way IR are licensed is a cooperation. no one disappearing.

audience:synergy between formats? (pdf/html)

m:xps,
e:no one solution pdf is dominant form right now

audience: user behavior observances?

e: people start broad, they are afraid to miss information. they want tools to help them refine, they have a few products they go to for information.
m: breadth of use interested in scholarly works
g:the only unusual thing is that they are different from normal searches, advanced search page not used very often only used to limit time period to last five years.

Internet Librarian 2006: Monday Session 1 Track D

web 2.0 Library 2.0 and more
Dr Paul Miller
TALIS

white papers
do libraries matter?
library 2.0: the challenge of disruptive information

reaching out to where the users are

opac button from content management system is not library 2.0

library 2.0=hype?

library 2.0 message = opening the library up. getting our knowledge out of the building

engage users to find out what they want instead of shaping them to meet what we think they need

there is a need to disaggregate giant systems

break everything up into modules that make sense

–need for real meaningful systems

libraries should not duplicate financial systems. i.e. using peoplesoft instead of ACQ module.

key is to deliver better services to end users
shared innovation-things going on around the world. need to open innovation that is spread out so that everyone can learn from the knowledge

mashup competition from TALIS is an example of trying to draw things together where everyone can see it

1. Jon Udell took info from his library’s catalog and put it where they can see it on their google search page

he created a simple app that puts info where the users are..

2. second life library one advantage is giving SL users access to resources like worldcat

taking library info and putting in places where users are more comfortable

competition continues

Innovation Directory

a place where they are trying to bring innovative things together in one place.

useful place to look to see what peers are doing

what makes library 2.0 possible
-falling cost of storage
-falling cost of computing power
-Growing connectivity *pushing information along quickly
-camera phones *active consumerism, everyone has something to give.
-commodisation *instead of figuring out what you need first before starting a project, instead you just buy things as you need them.

Three o’s
-open source *new ways of getting the software we need, important to remember that it isn’t necc free. buying staff time etc.. to maintain.
-open data *need to be better at opening up access to the data we have. currently asserting more control than maybe we should
-open APIs *once you have open data you need to give users access to them.=programmatic access that allows machines access to the data on behalf of the user.

Essence of library 2.0 –architecture of participation
community of users coming together making data work for them
blogs wikis just an example of user coming together.
blogs are just a diary. the participation is the piece that is more important than individual participation.

seeing library allowing their users to participate and engage

*Architecture of participation.
are vendors participating??
does vendor engage in open dialogue?

*Platform of participation

state of Georgia employed a lot of programmers to develop and support open source ILS (pines)

TALIS: keystone – recognizes we all have traditional ILS
in the meantime we build open source modules to sit on top of what we already have. allows access to institutional portal. -gradual evolution within organization

lets you make things possible without throwing out existing software

data is more important than the software -

Open data in the library
current model broken. wrong to pay people to tell users what we have.
aggregate model flawed because you have to pay to join and difficult to switch and move around

shared data should be able to move around and made easy to move around

its not about just giving access to users to do whatever they want with the catalog.

Open Systems.
systems that plug together and make use of what we have

one example: Talis silkworm

attempt together info about library and expose that for 3rd parties to take and make use of

silkworm pages tell users how they can send information and receive data from particular libraries

need to build data warehouses and provide access to them

TALIS example : bigfoot
no idea to understand what is happening until you build the application

build interface so users can see how api is working

rdf result set is something that software app can do something with

you can send requests for holdings info etc..

data store can hold diff things- user comments, review, book cover info etc…

i.e. grease monkey scripts for user results from your library.

with directory you don’t have to a diff script for each library

useful to see all library that hold book, but not necc useful. better to see books near you that have it etc.

=grouping web service lets you build group of library’s with meaning to you.

this grouping pulls books from library’s you are interested

in from whatever website you are at.

TALIS: CENOTE
building on underlying structure, any library should be able to do this. contribute their info and provide info so that they can build services or allow others to do so as well

we need to liberate data and be able to build these services.
need to get data to users

Tagging Presentation- (courtesy of Slideshare)

tag:

Library Tagging Ideas

The objective is to create a functional service that is easy to use and would allow students, staff and faculty at the University of Florida to add to, and draw from a database of stored information. The resulting database will be a warehouse of bookmarks generated by the user. These bookmarks will link users to resources they have previously discovered from using the Library catalog, purchased resources, or research materials from the internet. Information will be organized by USER ID and by assigned Tags that identify the information for the user and others. Keywords assigned to bookmarks are considered tags that serve as index points into the list of saved resources.

Several types of similar projects are available for inspection on the internet. Commercial applications include such websites as del.icio.us and Flickr. The closest example in the academic community can be viewed at PennTags (http://tags.library.upenn.edu/). The PennTags project allows users to tag, or bookmark instances of information in the catalog and on the internet. The UF Tagging Project would perform a similar function, but would have some added features.

Most tagging implementations allow the user to only tag items on one level. I am proposing a system that would allow the user to create two sets of tags for any given resource. The first set would allow them to create social tags that are designed to allow others to find and use the same information. These keywords would describe the object in such a way that it would be evident to other users what the underlying concept is. The other set of tags that could be associated with the item would be personal tags. These “personal tags” would allow the user free reign to create a new way to organize the information, and would not be shared. Examples of this kind of tag include “toread”, “free”, or involve the use of special characters.

A controlled vocabulary will be put in place to monitor the social tags. If tags are not contained within the controlled vocabulary the user will have an opportunity to decide to temporarily add them pending review, or to tag them as personal. Social tags can be viewed by the entire community, but personal tags can only be viewed by the user. If a user wishes to keep a tag private then they will tag it as personal.

Tagging systems allow us to create new tools to aid users with their resource discovery tasks, or to help users organize the information they have amassed. Tag Clouds allow users to see at a glance what subjects are popular in the community. This can be accomplished by increasing the font size of a tag as the number of instances increases. An Interest Tracker is a timeline that would allow users to see visually what keywords they have been using, and with what frequency since they started using the tagging system.

A reputation scale will be employed to give users an additional means to decide whether a resource is valuable or not. A user’s reputation will fluctuate as other users rate them by what they have tagged. If another user finds the tagged item useful then the original poster’s rating will increase and vice-versa. Librarians will start out as power recommenders giving their tagged resources a greater degree of authority in the system.

To increase usefulness, links to citations and proxy services will be employed as much as possible. The user should be able to easily migrate to resources no matter where they are. They should also be able to keep track of citations they use when writing papers or working on projects.

This project is aimed at fulfilling the library’s goal to simplify the search and discovery of library resources. Users will have a tool at their disposal to keep track of the items they have already discovered, and a means to find out what other users have noted as worth looking at. The interest tracking feature will guide users through their past search habits to help them keep on track with how their research has changed over time. The Tag Cloud feature will help them see what topics they find most relevant to their research and what the University as a community is studying.

This project would also increase the visibility of the libraries for the UF community. Users will see what a valuable resource the library has made available to them, and this will also increase their use of the resources available because they will be more exposed.

The Tagging system will enable the community of users to more effectively manage library provided resources. This grant will give us the opportunity to strengthen our programming skills and an opportunity to network with the creators of other tagging projects. As a result of this funding we will have the time and resources to follow through with a plan to implement this service for the UF community by next fall.

Criticisms and Benefits of Folksonomic Structures

Jason Fleming

LIS 6726

Indexing/Abstracting

Drew Smith

July, 19 2006

Introduction

Web publishing has become commonplace as any user with access to the Internet is able to create content to share with others. This has only served to further fuel the information boom. The current method of using personal bookmarks and search engine queries does not provide the necessary for the active information seeker. New tools are needed to keep track of information as it is discovered to prevent information loss. There additionally a need for tools that facilitate reliable information discovery. Bookmarks were introduced in the early 1990’s and quickly caught on as an effective way to represent hierarchical structures of found links. These were not without their limitations though as users found it difficult to share their bookmarks, and the structures became unwieldy as the number of URLs bookmarked grew. Super bookmarkers who published their lists online provided the community with valuable information. The links in these pseudo web pages suffered from an inherent instability associated with typical URL breakdown. Another feature that is significantly missing in a bookmark is a time-stamp that would allow the user to gauge during what period of their life they found the item interesting (Abrams & Baecker & Chignell, 1998). This feature is present in several tagging systems giving the user a quick glance at their own personal timeline of interest.

Tagging is a term that describes the act of assigning of keywords to objects on the Internet to aid in future retrieval efforts. Similar criteria are used when creating bookmarks as goes into the process of selecting information to tag. The user must determine if the object is generally useful, of good quality, and interesting (Abrams & Baecker & Chignell, 1998.) Users assign single words that theoretically represent aspects of the content that are of interest. Users can later retrieve items by performing a quick search of their tag library (Hammond & Hannay & Lund & Scott, 2005). Interest in this functionality has increased, and a number of sites currently now offer some attributes of tagging. Folksonomies represent the action taken by the web community to gather tags together in aggregate to increase the opportunity for information discovery (Brooks & Montanez, 2006). The term itself is a colloquialism that is intended to convey the social aspect of indexing digital objects.

Folksonmies in action are easily viewed on the web sites Flickr and Del.icio.us where users are able to tag items such as photos and web sites respectively. Users are able to search for all the items they have tagged with a desired terms or terms. The added functionality of aggregation comes in the guise of a tag cloud, or a list of the most popular tags. The aggregate of Flickr tags can be viewed visually by navigating to the website http://www.flickr.com/photos/tags/. Popular tags are represented here and the user can easily see at a glance which tags are most popular by noticing that some terms are larger than others (Chudnov & Barnett & Prasad & Wilcox, 2005). The more popular a tag is, the larger the font it will appear in. This list changes continually as the aggregate of terms are re-indexed by the website. Del.icio.us offers the other type of folksonomic instance which is a list of popular tags. These tags are continually listed as users create tags, and the site sorts them by popularity. This listing can be found on the front page of del.icio.us (http://del.icio.us/ http://del.icio.us/.) It is interesting to note that the creators of the site have decided to limit the number of results to three entries per hour for a time span of five hours. This allows users to see at a quick glance what is “hot” right now. It is also possible to subscribe to the RSS feed of popular tags or those tags generated by a single user (Godwin-Jones, 2006). This feed of continual tag popularity represents the topical current of the user aggregate. An in depth analysis of the structure produced by a tagging system contrasted with the hierarchical structure typically provided by bookmarks will serve to illustrate the advantages of folksonomies. Tagging systems have their share of detractors, who are quick to point out the various deficiencies inherent in the system. These points should be addressed individually with corrections or a rebuttal illustrating the benefits of what is seen as a weakness. In either case an examination of methods that can be employed to “fix” folksonomies will generate a positive discussion within the tagging community.

Hierarchy vs. Post Coordinate Index

Bookmarking represents a hierarchical structuring of web content as created by the user. This organizational method stands in contrast to that of tagging systems which are flat by design. Tags are set up in a post coordinate index, where each term carries an equal weight to any other term (Lancaster, 2003). No relationships exist between terms so the word cat is not associated with the word kitten. Many detractors point to this perceived weakness of the tagging structure as a flaw that inhibits the organization of information in a meaningful fashion. Bookmarked structures have a related problem in that folders might be created and structured in a biased manner. Relationships in indexes can have similar flaws as the relationships between terms and content may be chosen subjectively depending on their level of familiarity with the content (Fidel, 1994). Hierarchical systems also suffer from a fixed state condition that limits users to on choice of locations for an item to be within the structure. Tagging systems do not have this problem because all objects are accessible on a flat playing field (Golder & Huberman, 2005). A tagging system is similar to an indexing structure because it is an art form that allows for a number of different ways an item (page) can be referenced by a tag (entry) Mulvany, 2005). Tags, unlike bookmarks, represent a large number of access points that can produce the correct answer for the user (Tonkin, 2006).

Anti-Folksonomy

Tagging systems are not without their critics who are quick to point out the negative aspects of the system. Sloppy tagging is the name given to the creation of tags that weaken the system by their nature (Guy & Tonkin, 2006). Tags are widely used for many reasons, but one of those reasons is that they can be applied quickly and often without too much cognitive effort. This may be the attraction of organizing information using this method, but it often leads to user error. Tags that are spelled incorrectly can result in a loss of information as users lose track of items they have tagged with the wrong term. This action splits the entry point if the user tags different items in different ways. This might also occur if the user pluralizes some tags inconsistently. This issue is still contentious in the indexing world as well as evidenced by recent emails in the Index-L listserv which made it clear that not everyone is in agreement concerning how pluralized entries should be entered (Brown, 2006). It is most important that the user is consistent within their tagging library, but it also becomes necessary for the sake of folksonomies to consider how the same tags are constructed in the entire database of users. One answer might be to use a concept that has been championed in the indexing world known as double-posting. This method is used when an indexer wants to insure that all users are able to find the information they desire regardless of how it has been entered. So an indexer would tag the singular and plural in this instance or other forms the word might take (i.e. folksonomy, folksonomies.)

Another example of sloppy tagging is when homonyms (e.g. drive; a car, or the computer storage device), are used. This can create confusion for searchers who want information on one subject but are directed to another completely unrelated one. One solution to this problem might be to educate the tag creator and encourage syntactical creation that encourages logical oversight. For example the word drive would not stand alone; it could be joined to another word such as computer to give us computer-drive. This represents a subheading entry that should be familiar to users of back of the book indexes (Mulvany, 2005). The problem with this is that users would have to come to some consensus concerning how these subheadings are constructed or else you will get users using different methods to construct the same thing but using different symbology (-,/,+.) This might be where the site creator can step in and limit the types of symbols that can be used. Tag Bundles are new folksonomic tools that have been developed to create tags for a group of tags. This allows users to create a hierarchy of tags that may allow for new ideas (Hammond & Hannay & Lund & Scott, 2005).

Tags are primarily personal in nature and as such tag creators will try to make the structure work for them. This practice can diffuse the folksonomic structure due to imprecise words that are used purely for the creator. Examples of these kinds of tags include but are not limited to todo, toread, and towatch (Brooks & Montanez, 2006). These terms are ambiguous and mean nothing to other users, but they can be very beneficial to their creator who is attempting to keep track of object they are actively interested in (Guy, 2006). The answer to this problem may be to create a separate tag line for the item being indexed. This practice is familiar in the indexing world as separate indexes are created for a single book to represent different aspects that cannot come together in a single index (Mulvany, 2005).

Benefits of Folksonomies

Folksonomies bring together many points of information that have been collected by diverse groups of people. They are unknowingly working in concert to make something that is bigger than each individual. Every participant can reap the reward from this effort as well as folksonomies encourage information discovery (Surowiecki, 2004). Of course it is important to point out that this is not a closed system and it is possible for users to confuse the set of data by performing what is known as tag fraud. If false data is entered to lure unsuspecting users to sites that are designed to infect computers or to entice them into spam like schemes then the system does not benefit. Reporting mechanisms need to be in place to insure that this behavior does not continue unchecked. This is a problem that popular online systems must face, as they grow so do the number of people trying to take advantage. It is possible to also create a system that requires user authentication when an account is first created. This is traditionally done by emailing a link to user to access a verification page. It is easy to talk about the benefits of folksonomies, but it is also wise to consider the dangers as well.

Fixing Folksonomies

The main problem with folksonomies is coincidentally their greatest feature; namely the simplicity of tag creation and information gathering. When offering potential fixes to the folksonomic problem article authors are usually quick to mention that changes should be minimal and made in a gradual manner (Brooks & Montanez, 2006).The Web is immense, and professionals simply do not have the time to index everything on the Internet (Golder & Huberman). Tagging is an open source type of solution that can be utilized by anyone on the internet. Tags can be created during resource discovery by using bookmarklets, or firefox extensions that allow users to make tags without leaving the site they are on (Hammond & Hannay & Lund & Scott, 2005). For folksonomies to be most beneficial they need to overcome some of the problems that have been pointed out. How to instruct the user to make better tags is one of the issues that must be addressed.

Several researchers at the University of North Carolina undertook a program to educate authors in the construction of Metadata for their own web content. This group of users were given a half hour course in Dublin Core Metadata creation, and then tested by letting them tag their own content. The result of this study showed that users can be taught to create good tags that are similar enough to ones created by professional. There is no substitute for years of training, but some instruction given to the content author was the next best thing (Greenberg & Pattuelli & Parsia & Robertson, 2001.) Unfortunately every tag creator cannot be sat down in a classroom where they are given a lecture about the finer points of polysemetic distinction (homophones with similar meanings. The answer might lie in a voluntary tutorial made available to the user in an unassuming fashion. If ideas that can easily encourage the productive evolution of tag semantics are propagated then that might solve today’s set of issues. Proponents of folksonomies will be quick to point out that this is a natural phase for the language to go through before a grammatical solution is presented (Guy & Tonkin, 2006). The quick and easy way that tags can currently be added to objects as index points on the Web certainly points to its usefulness as a tool. Future improvements will be made as the community decides changes are needed. Del.icio.us has already started using a list of suggestions for tag creators to choose from when constructing an entry (Hammond & Hannay & Lund & Scott, 2005). These suggestions are based upon entries that have been created by other users. Unalog (http://unalog.com/), another folksonomic site, has a hint page for searching, but they might consider a hints page for tag creation.

Another suggestion has been offered to bring to light the benefits of both a hierarchical system and the flat structure of tagging. Researchers in the Computer Science Department at the University of San Francisco have begun researching a way to examine a cluster of tags that point to a reference, and combine that information with automated indexing of the site (Brooks & Montanez, 2006). This process can be performed by a series of computer “agents” that automatically find and auto-tag a web page (Godwin-Jones, 2006). The inferences made between the two sets can be used to create a pseudo-hierarchical structure that can provide relationships between terms. The obvious benefit of this proposal is that users would retain the benefit derived from the open structure provided by tags and they would also benefit from the relations generated by this process.

Conclusion

Folksonmies are a hot topic right now in the realm of resource sharing, and discovery on the Internet. Whether they will last the test of time, or vanish back into the dust from whence they came, has yet to play out (Tonkin, 2006). The advantages they have are many; simple to create, easy to access, and easy to edit. Disadvantages are controversial and may have been overstated in an effort to help the community grow towards a better developed system. This process is analogous to hacking into computer systems to illustrate to site owners that their resources are not secure. With concerted community effort the massive voluntary indexing of the internet may prove to be the future of the internet’s organizational structure.

References

Abrams, D. & Baecker, R. & Chignell, M. (1998). Information Archiving with

Bookmarks: Personal Web Space Construction and Organization. In Proceedings

of the SIGCHI conference on Human factors in computing systems. Retrieved on

July 14, 2006 from the ACM Digital Library

Brooks, C. H. & Montanez, N. (2006). Improved annotation of the blogosphere via

autotagging and hierarchical clustering. In Proceedings of the 15th International

Conference on World Wide Web (Edinburgh, Scotland, May 23 – 26, 2006).

WWW ‘06 (pp. 625-632). New York, NY: ACM Press.

Browne, G. (2006, May 23). RE: plurals using (s). Message posted to index-l listserv,

archived at http://lists.unc.edu/read/messages?id=3400320#3400320

Chudnov, D. & Barnett, J. & Prasad, R, & Wilcox, M. (2005). Experiments in academic

social book marking with Unalog. Library Hi Tech, 23 (4), 469-480. Retrieved on

June 28, 2006 from the database: Emerald Insight.

Fidel, R. (1994). User-Centered Indexing. Journal of the American Society for

Information Science, 45 (8), 572-576. Retrieved on July 11 2006 from the

database: Wiley Interscience Journals.

Golder, S. & Huberman, B. (2005). Usage Patterns of Collaborative Tagging

Systems. Journal of Information Science, 32(2). 198-208. Retrieved on July 11,

2006 from http://www.hpl.hp.com/research/idl/papers/tags

Godwin-Jones, R. (2006). Tag Clouds in the Blogosphere: Electronic Literacy and Social

Networking. Language Learning & Technology, 10 (2), 8-15.

Greenberg, J. & Pattuelli, M. & Parsia, B. & Robertson, D. (2002). Author-Generated

Dublin Core Metadata for Web Resources: A baseline Study in Organization.

Journal of Digital Information, 3 (2) Article No. 78, 2001-11-06. Retrieved July

15, 2006, from http://jodi.ecs.soton.ac.uk/Articles/v02/i02/Greenberg

Guy, M. & Tonkin, E. (2006). Folksonomies: Tidying up Tags? D-LibMagazine, 12 (1).

Retrieved on June 20, 2006 from

http://www.dlib.org/dlib/january06/guy/01guy.html

Hammond, T. & Hannay, T. & Lund, B. & Scott, J. Social Bookmarking Tools (I): A

General Review. D-Lib Magazine, 11 (4). Retrieved on July 14, 2006 from

http://www.dlib.org/dlib/april05/hammond/04hammond.html

Lancaster (2004). Indexing and Abstracting in Theory and Practice. Champaign, IL:

University of Illinois.

Mulvany, N. (2005). Indexing Books. Chicago: The University of Chicago Press.

Surowiecki, J. (2004). The wisdom of crowds: why the many are smarter than the

few and how collective wisdom shapes business, economies, societies, and

nations. New York: Doubleday.

Tonkin, E. (2006). Folksonomies: The Fall and Rise of Plain-text Tagging. Ariadne,

47. Retrieved on June 28, 2006 from http://www.ariadne.ac.uk/issue47/tonkin/intro.html

Eluna 2006 presentation

Betsy Simpson (Head of Cataloging at UF) and I presented at ELUNA this year. The title of our talk was Statistics for Cataloging: using Triggers and macro express.

Essentially we talked about how we are using triggers at UF to attach tags to our Holdings records. The metadata underlying these tags include Cataloging Unit, Cataloger, System # and Holdings #.

Combined with the predefined tags a report can be generated that gives useful information from the point of cataloging.

We use macro express to ensure the defined vocabulary is used correctly and uniformly. When collecting the reports macro express is used to being in additional information from the bib record like the Collection, sublibrary, and call# of the item.

The presentation went very well, and we look forward to refining the process described at the talk.

DVD-VID reclass lists

MPOW decided to reclass the main libraries documentary films. In the past Feature films received an accession number while documentaries were cataloged using LC. The decision has been made now to convert all of the titles in LC to be integrated into the list of feature films. I can see how this would be desirable for staff access etc… but I can also see how the patron is losing a certain amount of access to categories. I just finished writing the macro that converts a report in Aleph to a list of titles with all the needed information (System#, Holdings #, barcode, Title, Call#, Item description). I then created a macro that assigns an accession number to the items in the list. The trickiest part was skipping titles in a series (pt.1-5 etc…) That wasn’t too hard though I just wrote some code that compared the call#’s and only increased the number assigned if there was not a match. This worked out well and I have the lists ready to go to Copycat where the labels will be made and then to Access services for labeling.

I will next write a macro that works in the catalog to alter the call# in the holdings to reflect the assigned accession number. The old (LC) call # will be placed in a non-public z subfield in the 852, because it is assumed (by catalogers) that next year they will want to reclass the videos again.