Flying high on the web or in free fall?
Rich Wiggins MSU
Microsoft Elsevier, and Google representatives on the panel
The future?
E:dynamic market, phenomenal the info available. working on federation and analysis. peer review question, publishers becoming more important? in academic market this poses some challenges:
- versioning
- quality (peer review)
- archiving
scirus
giving structure to the data.
M: wants to broaden question. how it is relevant to academic librarian. has broad interest and appeal. user appeals are not fitting with typical academic researcher. queries span into diff regions. Excited about content from publisher or open access repositories. making it available using Academic Live search is what they are trying to do.
Question? what kind of searches are being done?
(from audience–science searches)
M:seeing searches from a wide audience.
G: (on the question of academic searching) -sharing a story- as an undergraduate on a small campus in india trying to find faster ways to multiply small numbers on chips-went to library looked up articles followed citation sent work in and reviewers told him it was nice but where has he been for last five year (laugh)
Goal of what they are trying to do is make it possible for anyone anywhere to find research wherever it is produced throughout the world.
-Q: from Donna (LC) in the audience- in terms of scholarly publishing finds that audience is going to OA or blogs what do you have to say about scholarly publishing on blogging what should she do?-how do we capture scholarly info.
G: from goggle’s viewpoint -1st trying to make sure that formal information first (from google scholar POV)
Donna:trying to capture serious scholarly blogs that actually have scholarly info- how is google keeping up with this new trend?
G: what is the problem.
from the audience: blogs are ephemeral but they might have good academic material.
E: so what you are saying is that there are new modes of publication, I would recommend that you go to scirus because they don’t just look at traditional publication types. important to make a distinction between blogging and peer reviewed material and scirus is able to do this.
M: come up with way to identify blog as scholarly so it can be cataloged and made discoverable. =standards
Audience: how to preserve blog content
Barbara: this happens with scholarly journals as well. failure of archiving
E:challenges of new era -archiving and versioning already mentioned (see above) that is why we have peer review process
(scirus seems like a unique thing, how do you address what is on the web?)
versioning and archiving is responsibility of publisher.
Barbara: google scholar approach, struck by fact that when you cluster material you weren’t limited to types of products. do you follow into blog country to pursue the authors thoughts.
G:archiving significant problem
Barbara:frequently google’s cache returns things that have been gone
host:wife’s cat has thyroid problem and radiating the thyroid kills it and another gland takes over. google search for radioactive cat how do we find important blog articles
g: important articles will rise to top
host:what about controversy
g:if we can analyze it then the one with more people believing in it will rise to the top. we have to depend on what the people in the field talk about.
m: uses similar techniques for relevancy ranking.
host:serendipity. throwing random things into hitlist?
M:no.
e: agrees, relevancy is difficult to compute. trying to show results from different source, scirus has keyword terms to give clickable related terms.
host: theory of comprehensive search, women part of clinical setting, show to have toxic effects that were not indexed online. dr. only looked online and was not able to fond this info so lady was killed.
e: crucial question. difference between search engines and A&I databases. A&I database goes back and enters information. In terms of comprehensiveness search engine cant really do anything about this.
host: if your relative had complicated condition what databases would you search.
e: abstracting & indexing database material determine by editorial design. Would not google it.
m: focused on historical material in printed form, only a matter of time before information made available. Start with brother who is a surgeon.
G: I would search everything, every possible thing no matter where. Would not stop at google.
Barbara: contact medical librarian.
from audience: controlled vocabulary? And date searching?
g: search logs show no vocab based searches so doesn’t think it is very important, there are issues because people don’t know what problems are.
Donna: the law , when authoritativeness –when most important piece of law could mean someone going to jail or not. Working on something bringing all the different pieces of law from around the world into one place and using an authorotative language to go behind it. They help you get over asking the right question; what you don’t know doesn’t hurt you. Software and hardware.
if you type in a particular term then a sidebar could pop up that have the related terms
Donna:knowledge in thesaurus beyond what any one person may know.
e: for example if you type in heart attack you get cardiac failure as well. Thesaurus important. In scirus there is a controlled vocabulary.
m:good feature idea for navigating through information. Looking at different types of taxonomies. writing software to draw vocab from text. investing in this idea.
g:problem with concept refer across different languages and fields is complex and not easily resolved by using ontologies. you can go back and look at common use and scenarios. when a new concept is used it is described in previous terms and then it is named and referred by new term from then on. simplest thing to do is expand queries. keeping track in drifts–influence across different fields is hard to achieve but it does not solve by using ontologies to say this is solving the problem
host: if I search dna testing how are you mapping it to say “did you mean” kind of improvements.
g: collaborative fltering: no
m: no,
e: no, but it is an area where search engines might be moving to make use of the community that is using it.
host: are you capturing click throughs?
e: we don’t have log analysis on which pages are being clicked on
m: yes
g: of course
Barbara:Q? the problem with taxonomy is that they don’t solve the problem by going back in time, but what about clustering, relevance ranking in a certain field of expertise, if I want to use equity as a legal term that is one grouping but if I want to use it as a company that is another search, I would like to be able to narrow search. Northern light and ask.com do this.
host: so disambiguation
e: diff approach all documents are classified by scheme you can assure that you are only search within a specific scheme. based on training set took articles from classification they knew and used terms to reclassify terms in scirus
m:no we don’t but it is an interesting term. difficult to solve
g: cluster instances of same work, tricky to figure out right granularity for it to be good for the user, attractive and easy to use for specific or artificial examples, to do it across the board and so it is easy for the user would be quite difficult
host: date of document
e: in scirus you can,
m:you can sort by date,
host:problem is they web site doesn’t tell you when items were last updated.
g: yes, they have to be able to see if pdf and doc are same. Continuum – question -is it useful as a result, if not it shoud be removed. how to figure out date.. in some cases you can, depends on data, on academic scholarly cases it is possible because it says so. harvest full text and analyze document itself, do not rely on info provided to them. possible in some cases to do a reasonable job.
audience: do none of you search informal scholarly literature. what sources.
g: can you define informal literature – audience:blogs
m:no blogs
e: we don’t focus on blogs, but if we determine it is scientific then it is absorbed.
host: who decides if something is scholarly?
e: analyze terms in documents and based on that (vocab) they are marked as scientific, use relevancy score.. it is robotic
host:will satirical scientific journal be included
e:great trust in algorithm
g: at this point they don’t include blogs because they don’t feel confident that they can decide what is valuable.
audience:pre print server issues?
g: yes they
m: yes actively try to find OA
e:yes if it is scientific
host:self archiving issue is something to ponder
audience: what id A&I opened their info to you what would you do with it? (structured xml)
m:we would love to have high quality metadata.
e:work with constructing, medline database in scirus and use data in there
g:would take it in a heartbeat.
audience:that is the problem, what search engines are not getting is the problem. disconnect between electronic publishing systems and getting it into the search engines.
e:we work closely with publisher and we don’t agree with that. open archiving initiative is making good crossroads. we work with 10 publishers and just signed agreement with crossref-metadata and then full text, protocol developed by crossref. Estimation that there are 4000 stm publishers.
g: if a&I were able to provide data in xml then they would take it in a heartbeat. dealing with the
barbara:bottom line. services handling info needs when librarians are not there. what can we do to help you do a better job, for example open access good for everyone should we contribute to lobby for this?
G:the bills are paid by the librarians, if you want a comprehensive search , content that is licensed must be indexed, it will improve search dramatically.
m: provision of content and make sure it indexed, putting structure on content, classification of blogs for example and publishing of metadata etc..
host: interesting to see who defines standards
e:working together with librarians is key. they are the experts on what users need, way IR are licensed is a cooperation. no one disappearing.
audience:synergy between formats? (pdf/html)
m:xps,
e:no one solution pdf is dominant form right now
audience: user behavior observances?
e: people start broad, they are afraid to miss information. they want tools to help them refine, they have a few products they go to for information.
m: breadth of use interested in scholarly works
g:the only unusual thing is that they are different from normal searches, advanced search page not used very often only used to limit time period to last five years.
il2006 “academic search” future library