Saturday, June 15, 2013

8 things we know about web scale discovery systems in 2013

Web scale discovery systems as a class of product has existed for over 4 years, and there has been rapid adoption by academic libraries around the world. We are currently way past earlier adopter phase, and probably deep into if not past early majority phase.

Some of the early leaders in this space like Summon are even announcing a "2.0" version, which may or may not be marketing hype but is symbolic I guess in signalling that products in this class have reached a certain amount of maturity.

Today in 2013, Summon alone has over 500 libraries using it, and many more are using Worldcat local, Primo Central, Ebsco Discovery Service etc. As usual, this has led to the rise of professional literature written on the topic (see list curated by me here), covering a host of areas including
  • impact on resource usage (full-text downloads, print catalogue items, A&I usage, articles not indexed in discovery)
  • impact on workflow for management of eresources
  • proper marketing and positioning of discovery products for users
  • impact on teaching of information literacy by librarians
  • surveys on attitudes of librarians, undergraduates, graduate students and faculty towards discovery services vs databases
  • usability testing & intergretion of discovery services into library websites 
and many more.

With all this literature out there, what do we really know about web scale discovery services in 2013 that we didn't know in 2009  and what are some issues where the jury is still out?

Some qualifiers.

First, I don't profess to know all the answers or have read or even remembered every study done on discovery services, nor am I an "expert", though I have kept my eye on this interesting area.

Second, I have far greater familiarity towards Summon (which we test and implemented in our institution in 2011-2012) and to some extent EDS so what I write might apply only for Summon. (e.g I wonder if EDS interface with more advanced features at the cost of a crowded user-interface would mean advanced users would be more satisfied) . Still I suspect on the general high level view, the web scale discovery services on the market are similar enough that most statements appear for all of them.

Third, I am going to speculate based not just literature but also my own knowledge and feel of what the general consensus is (which might be wrong).

I hope this post can lead to some fruitful discussion, even if you disagree with what I have written.


1. Web Scale Discovery Services increase accessibility of eresources and will definitely on the whole increase full-text downloads

This seems to be the result that is most robust and uncontroversial. Every library that has implemented discovery services has reported on the whole usage of eresources has gone up.

Distractors might say, users might be downloading more, but do they actually find what they need? Or even if they found something that is just good enough but is it the best? That's a (possibly) fair but different point.


2. Undergraduates generally love discovery services
Again another point that is I believe mostly accepted. Survey after survey has shown undergraduates are generally happy with discovery services because it mostly fits their mental models by functioning somewhat like Google. Are they perfect and do all undergraduates like them? Of course not, but on a whole, libraries that have surveyed users have mostly obtained positive feedback compared to existing catalogue or search tools, this is of course unlike results for federated search in the past.


3. Librarians reactions towards discovery services are mixed at best. 

The earliest study I am aware of that surveyed librarians reactions to Summon reported "culture shock". This seems to be the default reaction of librarians who encounter discovery tools for the first time. Of course, this was by one earlier adopter library in Australia, back when the concept of discovery tools was still novel to the profession. and the study itself suggests based on a followup survey 6 months later that as librarians get used to the concept, they become more positive towards the tool.

However, more recent studies on librarians attitudes towards information literacy such as this one and this one suggest that librarians attitudes towards discovery tools are still polarized or ambivalent whether it be when using it and recommending it to users at the reference desk or teaching in classes. Attitudes range from enthusiastic support (see the series of free recorded webinars on Summon and information literacy adopted by librarians) to acceptance (sometimes grudgingly) to extreme opposition for instance claiming that teaching Summon is "a dereliction of duty reference librarians have towards their users" - one of the more extreme statements found in the literature.

Based on discussions with librarians both within and outside my institution, I can verify as well that there are many highly qualified reference librarians who dislike discovery services intensely and not out of mere ignorance or resistance to change.

There have been a couple of blog posts and papers trying to explain this resistance. Here's my take combining reasons I have seen given in various papers with my own thoughts.
  • Relevancy ranking results can be inconsistent if not awful (opinions vary on how bad this issue is, possibly depending on expectations, implementation and discipline). 
  • Lack of advanced search features 
  • Worry that some important material is missed out in the index or in some disciplines totally inadequate. Related is the view that a subject specific database is almost always better eg PubMed.
  • Worry that users are unaware that they are missing out material not found in the index, and they may settle for good enough instead of the best available
  • Worry that discovery services are damaging information literacy skills by misleading users into thinking research is easy
  • Technical issues relating to instability of linking to full-text, clarity of labels in the interface etc
  • Uncertainty on how to position discovery systems next to databases and how to teach
  • Worry that libraries are handing over too much power to discovery services due to lockin by discovery service providers who are simultaneously content providers (example of recent dispute).
Each point can be of course expanded further, for instance relevancy itself can be a big area, with some librarians unhappy about the weighting of content types (newspaper articles appearing too often instead of books) while others are unhappy with the overall relevancy ranking for known item searches.


4. Advanced searchers generally mirror the attitudes of librarians and are not as satisfied 

As expected, experienced researchers and faculty staff generally mirror the opinions of librarians and they are a lot less enthusiastic than undergraduates in general because they are familiar with what databases offer and are more demanding on what they should get.

That said the Ithaka Faculty Survey 2012 speculates that library heavy investment in discovery services are paying off leading to more faculty starting their search from the library catalogue in 2012, the first time ever it increased since the survey started in 2003.

As Barbara Fister points out, faculty staff are often searching "for known items, something discovery systems seem to handle rather badly", so this seems off.


5. Relevancy ranking can still be improved

This differs from service to service with some services claiming superiority in this area.

Head to head tests give mixed results, eg. This gives victory to EDS over Summon, this to Summon ,  this simple one gave A&I>Summon>Google Scholar, but this one gave it to Google Scholar over Summon etc.

But I doubt most librarians will say Summon or any other discovery service is as good as it can be and would yearn for better relevancy.

I am personally more sympathetic towards discovery systems in this area, though having spent countless hours studying and duplicating thousands of user searches since June 2012, I am well aware of how poor the relevancy ranking of Summon can be on some searches (I have also done limited testing on other systems).

Lest I be accused of not giving examples here's one Singapore "national service" , where currently the first 9 results are totally irrelevant. Though one example hardly proves a pattern, I am sure any librarian familiar with discovery services can give dozens of examples similar to this one. But of course, relevancy isn't an easy problem to solve and to be fair in this case, doing the same search without quotes actually gives you better results but still poor results.

Compared to the early days where discovery services raced to sign up content providers and boasted of the size of their index (they still do I guess), there is a increasing realization by all parties whether librarians or discovery providers that all this content can be counter-productive if the relevancy ranking isn't capable enough to surface the right or at least decent content.

Also as mentioned before there was in the early days doubts on how good such systems are for known item searching particularly for catalogue items and this continues to this day despite improvements.


6. Adding Federated search does not add much to web scale discovery (currently)

This is somewhat more controversial. But I believe the current consensus is moving towards the idea that tagging on federated search to web scale discovery is not that useful, at least with current implementations of this. An early debate in 2009 was sparked on the Federated Search Blog with the post Beyond Federated Search and followups, that critiqued Summon for lacking federated search, claiming that a hybrid solution of indexing what you can, and doing a broadcast search (federated search) over what you can't should be the way to go.

I could be wrong, but my impression is that many libraries that implemented Ebsco Discovery Service which does have federated search, have chosen to turn off the federated search portion, basically because it wasn't used and/or was counterproductive.

Federated Search is Dead -- and Good Riddance! , a piece explaining why James Madison University (JMU) turned off the EBSCO Integrated Search federated search add on included in EBSCO Discovery Service is perhaps a typical reaction.

Essentially the sheer size of the index of discovery services like Ebsco Discovery service or other services, means that students have no incentive to wait 30 seconds for more results, the problem they face typically is too many results, not insufficient results. Scholars will already be using traditional databases anyway  as primary search tool (e.g Scopus) and may just use Web Scale discovery tools as a final round-up of what they have missed so they don't really have a dying need to see results from such traditional databases anyway.

I would say even Ebsco is downplaying the significance of the option of federated search in their EDS service, as a look at their pages on EDS does not mention federated search at all (though to be fair it's a seperate product EHIS), and there is even a page on platform blending (which I frankly don't quite understand what is going on here despite a vendor explaining it to me) where they go out of their way to state it is "not federation"

Of course, an argument could be made (correctly I think) that the idea of a hybrid system is sound but the implementation needs a lot of work to make it worth it, but currently it seems of the 4 major players in the market none seem to have cracked this issue yet and may not do so in the foreseeable future as it is perhaps not a priority.

I would also add that many of the issues brought up back then about the dangers of ceding control to your discovery provider on the content can be found if you don't do federation, may still retain its teeth (again see recent spate), but at the very least on the practical front, the inclusion of a federated search option in a web scale discovery system generally isn't considered critical by most librarians now.


7. Content providers are generally eager to cooperate with discovery vendors to have their content indexed. 

One of the reasons why the need for federated search seems to have diminished is because more and more content is getting indexed. In 2009, there was still uncertainty on how content providers would react , would they want to be included? and discovery vendors had to work hard to get content included. If most did, then federated search would be of limited value except for reasons related to currency of results. If most couldn't be indexed, then federation would be crucially important to get at those resources.

As of 2013, the situation has clarified, over the years as more libraries started to release data showing that usage tends to fall for anything not in discovery services and or conversely anything indexed in them will lead to increased in usage, content providers have become more and more eager to be indexed or risk being cut out of the game.

The earlier mentioned James Madison University paper is perhaps instructive. Back in 2010 where he was describing the situation, of the sources, he mentioned that was accessible via federated search, by now many like JSTOR, Sage, Sciencedirect etc all are now indexed in Summon and probably other discovery services.

More interesting even A&I services like Scopus, Web of Science, MLA, ERIC are often included in many discovery services now though with appropriate safeguards to ensure their records are shown only for authenticated users.

That said, there are still hold-outs, the well known Psycinfo, EconLit etc and other A&I databases that work with Ebsco Discovery Service only is perhaps the most gaping hole currently existing.

And of course the above refers only to publishers but in general aggregator databases have been less willing (Gale seems to be a an exception here being included in Summon since 2009 and recently added to ebscohost discovery service as well as others) particularly those owned by Proquest and ebscohost are typically out of bounds to discovery services of competing services barring some special agreement.


8. Problems of broken links are still an issue though the problem is less serious and likely to be so in future

One of the greatest issues with discovery services is that they typically rely heavily on openurl to get to the full text. As is well known openurl linking is not 100% reliable, so discovery services have put in place alternate routes to full text.

For example Summon implemented "Index-Enhanced Direct Linking" and EDS has their smart links (if content is in the ebscohost databases) or custom links (I believe equivalent to Summon's index-enhanced direct linking in most cases)

That said, linking to newspaper articles, non-journal items and free content can still be iffy.

Still new efforts like KBART and Improving OpenURLs Through Analytics (IOTA) are underway, so perhaps in time to come this issue will be hopefully reduced.


Others

Some other less important findings,



Conclusion

I confess it took quite a bit of effort and courage to get this piece written and posted. Sometimes I wondered if I was getting the general consensus totally wrong, and yet other times I thought what I wrote is totally trite and obvious that people knew even right at the start of 2009.

I suspect the later is more likely to be correct, because I decided to err on side of caution and list the statements I thought were definitely agreed upon and bump the ones I was unsure to a follow- up blog post  "X things we still are unsure about web scale discovery systems in 2013".

But what do you think? What else is it we know about discovery services that were in doubt in 2009?

Monday, May 27, 2013

Zombies and libraries - how are libraries using the zombie theme?

Organization as serious as the CDC use zombie outbreaks as a fun way to educate the public, so why not libraries?

And indeed Zombies are a really popular theme for libraries particular for orientations. Libraries both academic and public libraries are using this theme to add some entertainment into traditionally boring orientations.

You can have a feel of how popular such a theme is with libraries by just looking at the following Flickr search.

It's unclear to me how much effort it would take to do some orientations, but here's an interesting Prezi by Central Methodist Libraries that talks about how to engage students using pop culture and mentions specifically "big games" which apparently is the term used for such events.

I looked around and here are some interesting library uses of the zombie theme that I found.


1. Zombie Guide to Miller Library!






Fun lively comic. Students and a librarian are attacked by Zombies and they need to find needed information as fast as possible on how to save themselves in the library. Do they know enough of the library classification system to quickly find what they need?

Interestingly enough the Miller Library at McPherson College uses the Dewey decimal system rather than Library of Congress Call number system which is I think more common for academic libraries.


2. RMIT Library Amazing Race Zombie Edition Orientation 




Above is a very amusing video on the event held Feb 27 at RMIT University Library. I believe this library has been doing zombie editions for at least 2 years?

Beyond that I don't have much details about the actual physical event held , though you can see some photos on their facebook page and a infographic showing that 190 did the physical game and 717 did the online version.

Here's the clever online version 



                      http://www.rmit.edu.au/library/amazing-race-online

A fun way to introduce users to various sites by the library, the short online game leads the user to various online sites by the library including libguides, youtube videos, Facebook/Twitter pages & library discovery system to find clues needed to complete the quiz to save themselves.




3. Marston Feed Your Brains, University of Florida Libraries.



Another intriguing zombie themed information literacy session by University of Florida. Much of it centered around the Zombie Survival LibGuide created by the library in 2010.

A full paper "The Library is Undead: Information Seeking During the Zombie Apocalypse" describing what they did and the reaction of users is available. More information is also available via a slideshare and podcast.





4. Lupton Library, University of Tennessee, Chattanooga - Nightmare On Vine Street



"Designed as a companion piece to an iPod-based building tour, the basic concept for the project was a horror-themed “escape the room” game. The entire group brainstormed the creepy scenario for the game: a student wakes up late at night in a study room on the top floor of the library and has to navigate their way out of the building by appeasing various librarian zombies encountered along the way." -- source

5. Zombies in the Library Calendar Tea Tree Gully Libraries



2011 calendar Zombies in the Library.

Librarians dress up as zombies to create a calendar.







"Zombies invaded a south Auckland library yesterday, all in the name of literacy." goes the local news report.

I don't have much details, but it seems to be the typical, survivors race to find information to protect themselves from zombies idea.

The event made local tv and there is an interactive text "choose your own adventure" type game using a wiki plus a Zombie apocalypse reading list.


7. Locating a Book (Brains on Books) - Western Illinois University Libraries.




Searching Youtube with zombie libraries, finds quite a few videos on the subject. Leaving aside the Thriller spoofs, many are simply marketing videos explaining why you need the library (for information on what to do with a outbreak of course), some are fairly well done but nothing in my opinion particularly interesting. but the above one by Western Illinois University Libraries  manages to sneak in library instruction....


8. Others

There are many many other libraries using the zombie theme. Eg Zombie and Vampire themed Trek using SCVNGR game by Ohlone College,

A quick search of libguides show that there are easily a 20+ libguides about zombies created by various libraries.

Typically named "Zombie Survival guide" or similar, some are merely using the theme, to inject interest into the topic of information literacy, eg. How to access information remotely in case of a zombie quarantine etc, coupled with fun listings of zombie related literature such as The Zombie survival guide, including typically works like  Pride and Prejudice and Zombies , tongue in check academic articles to semi-serious The zombie survival guide : complete protection from the living dead.

Others are similar to the University of Florida attached to a physical game of some sort, typically Humans vs. Zombies games. e.g Webster University

Another simple idea is just to have a "Zombie week" and book displays etc.

Conclusion

One thing it seems is that libraries that engage in using the zombie theme, tend to attract media attention due to the coolness factor. But this will wear off. Some of the events have only a very small traditional library information literacy component. There are other benefits of course, engaging users, and even raising the morale of library staff who get to let their hair down, but is it worth the effort?

How does one measure cost-benefit?

Have you done such zombie themes at your library? How did it go?



Monday, May 20, 2013

My experience visiting China for the Serialssolutions Greater China User Group Meeting

Last week, I had the amazing opportunity to visit Xi'an China from 9 May to 11 May 2013 to attend the Greater China SerialsSolutions' User Group meeting.



Regular readers of my blog will know since 2011, I have been reading (see list of articles I am curating) , thinking and blogging about discovery systems, leading up to the implementation of SerialsSolutions' Summon in 2012 in my institution.

I have tried to keep up-to-date with what pioneer librarians and libraries around the world have done with discovery and have interacted and learnt much from librarians in UK, US, Australia etc via Facebook, Twitter & blogs etc.

Obviously this was a very Anglo-Saxon view of things, but hard to avoid, given the nature of the social networks I was on.

But this User group meeting was in China! I was excited to have a chance to have contact with librarians in China to see what they were doing with Summon and learn about librarianship in China as I had never been to China before in my life.



Preparing for the User Group Meeting

Of course by now, I have attended a few library conferences overseas and am even fairly adept at giving talks at conferences (eg Internet Librarian International last Nov), but this time it was particularly tricky because the whole meeting would be in Chinese and I would have to present in Chinese.



For the benefit of international readers,  let me explain why that would be tricky.

While it is true that Singapore is majority ethnic Chinese (about 75%), and Singaporean Chinese like myself study Mandarin in schools as our mother tongue, English is our first language (though it may not be apparent with my odd lapses in written and spoken English I bet) and medium of instruction in schools. We also use it at the work place to communicate with all Singaporeans including non-chinese Singaporeans.

We are supposed to be bilingual in theory but effectively for many including myself it works out that while I can use Mandarin for everyday conversations eg. to talk about shopping, food, movies, Chinese songs (I listen to Chinese pop songs as well as English ones!), I struggle when it comes to professional terms as I studied librarianship etc in English.

Quick, what's "catalogue" in Chinese? Or even "metadata"?

Initially it was suggested that I do the presentation in English and SerialsSolutions staff from China would translate (there were other presentations by American and Australian SerialsSolutions staff done that way) but I decided to stretch myself and try to give it in Chinese.

I generally don't write out every word I want to say in a presentation, though this time I thought it was prudent to do so. I translated what I wanted to say from English with the help of Google translate and additional help from colleagues from our Chinese Library but still ended up with a pretty simplified presentation because I thought it would be best to keep it simple given my limited command of Chinese.

As a sidenote, I was quite impressed by how well Google translate was working, it was pretty good at translating even very technical terms and while it sometimes got the grammar and syntax order wrong it was usually spot on.

I also read a couple of articles on discovery in Chinese and this helped me pin down terms like "Unified search platform".

My institution has also one Chinese library but I must admit up to recently I didn't really focus on Chinese language searches but before leaving for China, I looked up what queries people were doing in Chinese (about 6-8% of queries were in Chinese).

I was also reminded of a feature of Summon that I read before but I forgot, that changing the interface language doesn't just change the text labels of the UI, but the search algorithm applied will change. In most cases, it seemed to make no difference in the search results ordering but in some cases it might give you better results if you changed the interface to Chinese and searched in Chinese as opposed to searching in Chinese using the English interface.


The user group meeting






The User group meeting was hosted by Xi’an Jiaotong University at the Nanyang Hotel. I was nervous as I was the third presenter, after presentations by Pecking University (the flagship Summon Library in China) and Xi’an Jiaotong University.





I was not sure what I expected but I did discover two things.

Firstly, I generally had no problems understanding the presentations even though they were in Mandarin (save one extremely technical presentation about some complicated custom integration of Summon with a OPAC system which I suspect would be difficult for me to grasp even in English).

When they said the term for say "relevancy ranking of search results" in Chinese, I had no problems knowing what they said, though the reverse doesn't apply and if I wanted to say that in Chinese  I often came unstuck :)

Secondly, it became apparent to me that the Librarians in China were mostly facing many of the same issues as librarians around the world.

I had no problems understanding and even some but not all cases nodding with agreement with some of the points made. Eg. difficulty of selecting appropriate packages in 360 core, relevancy ranking issues.




On the second day during the round table session, requests were made by China reference librarians for features including ability to sort by citation count, ability to filter by databases, social sharing features etc. Again these requests weren't unique to China users, I myself have heard such requests from our own users and librarians.

But by now I am familiar enough with the philosophy of Summon to know such requests were unlikely to be supported without strong evidence these would be used by searchers.

Of course, like every local market, China has unique requirements and features including censorship, discussions about working with China Academic Library and Information System (CALIS) - the China Consortium group to create packages for selection etc, libraries presenting on chinese ebook batch loading etc.

And of course there was concern that while Summon had very good coverage of Chinese material, compared to some local Chinese discovery systems it was still weaker, and a discussion on whether this was truly a problem.

From  the admittedly simplistic point of view of a librarian outside China, it seems to me that if the best University in China - Peking university has chosen Summon, there is some assurance at least that Summon has reached a certain level here, though obviously it can be improved further particularly if Chinese material is your main concern.

It was also impressed on me, how much Summon benefited from collaborating with Peking University, the university helped Summon with relevancy ranking of searching in Chinese and I think helping to provide a Thearusi/list of 2.7 million dictionary of Chinese names etc

There was also discussions of the possibility of use of Summon's API to populate Institutional repositories (probably not), and future developments. Unfortunately I promised not to blog about some of the possible future developments mentioned, though I think I can say that Serialssolutions is working hard on further improving relevancy ranking.

It was also announced that 10 universities in China are currently signed up with Summon as well as other high profile signups around the world including Yale and Cambridge (I think).

Somewhat amusing is that I also sat through my third talk on the upcoming Summon 2.0, by 3 different presenters to boot at 3 different occasions. :)

It was not all about Summon as this was a serialssolutions user group meeting, there were presentations and discussions on 360Marc, 360Counter, Intota.

Interaction with librarians

Sadly even in the best of times, I am quite introverted but this time my doubts about my command of the Chinese language made it even harder for me. Thankfully, some librarians from China, took the initiative to talk to me and I tried to converse about librarianship in general e.g the image and perception of librarians in China in my poor Chinese.

Some librarians I spoke to were also from Universities that traditionally have strategic alliances with Universities in Singapore and a few others also mentioned colleagues currently working in libraries in Singapore.

This led me to think about the possibilities of exchanges and strategic alliances between libraries in Singapore and China as well as in other countries.

Coincidentally upon returning I read about the online collaborative projects between China Librarian Hua Sun & American Librarian Mark Douglas Puterbaugh entitled Using Social Media to Promote International Collaboration. This paper described how interaction via the Facebook group Library related people led to fruitful international collaborations.


As a sidenote, there's a certain librarian in Singapore who seemed pretty famous in China as I was asked by at least 3 librarians whether I knew her and asked to pass on their best wishes. :)



Besides Chinese librarians, I also had the chance to meet and chat with  John Law, Vice President, discovery services, Serials Solutions who was at the user group meeting as well. The librarians in China were calling him "the Father of Summon" and it was interesting to hear his take on why he came up with Summon.





Travel & Sightseeing

















As is traditional for me to combine work with sight seeing, I also extended my trip a couple of days and took the opportunity to tour Xi'An China after the user group meeting. This was my first visit ever to China, and Xi'an is a very old and ancient city that was the seat of power/capital of many past dynasties in China.

I visited the Terracotta warriors (twice!) , Huaqing Hot Spring or Huaqing Palace etc. Since this is a librarian blog not a site-seeing blog, I won't describe further what I saw and experienced but I will say if you are into culture and history, Xi'an is definitely a good place to visit.

Obviously, it was a very interesting and educational trip for me, my very first trip to China!

I would like to thank the staff of SerialsSolutions and Xi’an Jiaotong University for graciously hosting us and showing us around.












Thursday, April 25, 2013

How are discovery systems similar to Google? How are they different?

Like many academic libraries, we recently launched our discovery service Summon. Having worked intensively on this project since 2011 during the evaluation followed in 2012 by the implementation phase, I had an opportunity to delve into the topic perhaps deeper than many of my colleagues not on the team.

I would guess most librarians probably see Summon and its competitors as "Google, but for academic research or they see as "Google Scholar like". For sure, users see it that way and so did I.

In a way this isn't a bad way to understand Summon. Similar to Google, Summon builds a centralised index of results that it queries whenever you search, so you can get almost instant results. Of course this isn't how older library federated search products work which pulls in results in real-time from multiple sources (this is the library equalvant of web metasearch services like ixquick ) rather than storing the data before-hand in a single index.

Similar to Google, Summon generally isn't restricted to searching just the metadata or the bibliographic record and searches through full-text of most journal articles and many books - if these are available or provided.

Other similarities include the holy grail of "the one search box" that searches "everything" (or close enough) and heavy focus on relevancy ranking to surface desired results.

As a sidenote, relevancy ranking isn't really new to library catalogues by now (for example our "next generation catalogue" Encore, has relevancy ranking and so does the older webopac), but one thing that is often missed by librarians is that because Summon searches full-text rather than just metadata/library record, Summon's relevancy ranking owes more to how typical web search engines works and is often unpredictable to a large extent.

Even if you knew the exact formulas and weightings of each factor, you would have to crunch the numbers and probably it doesn't work such that "no matter what, this journal must appear on top because it matched 245$a and...." and for sure you can't "explain" why this results appears on top but not another.

As stated in my post, How is Google different from traditional Library OPACs & databases? , Summon is probably as close to Google/Google scholar as any Library associated search currently out there including features like autostemming, search over full-text and Summon 2.0 will come even closer by adding Auto query expansion that will automatically search synonyms.



Other upcoming features like "topic explorer" which pulls in short entries from reference material from sources such as Britannica online and Wikipedia, reminds me of a very primitive form of Google Knowledge graph at least visually (as far as I know Summon has no Semantic search). For example compare the following result from Google for "heart attack".



With the topic explorer in Summon 2.0

http://www.serialssolutions.com/en/services/summon/summon-2.0

I would add that such "Topic pages" is not unique to Summon, for example Ebsco Discovery service is adding topic pages.

Summon 2.0's content spotlighting that "Groups newspaper content for easy identification" and "Local collection and image spotlighting" reminds me of how Google's "universal search" dynamically shows content from Google News and images when necessary.

Below shows a Google search with news items been distinctly grouped and highlighted


In short, both functionally and visually, Summon is getting very close to Google with the main exception it does not do a soft AND - it doesn't occasionally drops terms from the search.

A sidenote is that there are metadata fields in Summon that are never displayed to the user but are indexed and matched, so occasionally it seems Summon might appear to do a soft AND and pull out results that do not match all terms (taking into account stemming) but it's just a illusion.

As such I think while most librarians know how Summon is similar to Google/Google Scholar, what is often not mentioned is how different Summon is from Google. These differences are often technical but I suspect drive a lot of unhappiness towards discovery services because they can't meet "Google level expectations"

I am not technical expert but I believe, the main difference between Google/Google scholar and Summon stems from the fact that

Google mostly obtains knowledge of webpages/articles by crawling such pages and harvesting them directly using spiders, Summon generally doesn't. 





See also Google's Inside Search





This difference has 2 effects

1) Less stability in links
2) Less capability in relevancy ranking

Have you ever wondered why Google or Google Scholar seem to have a much lower broken links rate despite covering so much ground?

Essentially, how Google works is that, they have bots that go out to different webpages and capture the information on those pages and from those pages the bots crawl to other pages via links on those pages.

Google scholar is similar 

"Google Scholar uses automated software, known as "robots" or "crawlers", to fetch your files for inclusion in the search results. It operates similarly to regular Google search. Your website needs to be structured in a way that makes it possible to "crawl" it in this manner. In particular, automatic crawlers need to be able to discover and fetch the URLs of all your articles, as well as to periodically refresh their content from your website."

I recently lead a workshop on using Google Scholar for bibliometrics, and despite how I tried, based on the questions they asked, I suspect many just couldn't wrap their minds how Google Scholar obtained entries for indexing compared to how Scopus and Web of Science worked.

http://www.google.com/intl/en/scholar/inclusion.html pretty much sets out the inclusion guidelines for what Google Scholar will index.

Essentially a Pdf file, that looks vaguely article like (e.g Title in big font, author in one line before it, a section titled references etc) and on a edu domain will be considered scholarly and included by the spider into Google scholar if it comes across.

I believe Summon generally does not find information to index this way (I could be wrong).

This difference means that in general Summon relies fully if not mostly on the quality of information given by publishers etc (whether via FTP/USB/OAI-PMH) and does not really "know" if the information given is correct as it has not really "seen" the page or article in question on the site.

While Summon and competitors in its class try to obtain full-text as well as meta-data whenever possible, it relies heavily on the cooperation of the content owner. So often, it may just have the metadata but not full text, particularly for smaller less technically capable content owners. Comparatively Google Scholar if given permission can pretty much grab "everything" full text and all, if their spiders are allowed permission. My anecdotal testing shows this sometimes makes a big difference for example compare the following for Summon eds discovery in Google scholar vs in Summon and you will notice more relevant results appearing in Google scholar due to more full-text indexing even though most of the articles shown in Google scholar are indexed (metadata/abstract only) in Summon as well.

This also means unlike Google, linking in Summon is going to be less reliable. Let's leave aside the complication of journal articles residing in different locations and the need to use openurl resolvers and assume all articles reside only at the/one publisher.

Google is generally sure that when they display a link, the webpage exists, at least at the point in time the bot harvesting the page, it was definitely there. And also because they directly check to see if the page exists, they can easily do link checks and fight link-rot. They can even tell which domains tend to have more broken links and can penalize such sites more.

Imagine if Summon had such data and could use it to automatically adjust openurl database ordering when there are multiple copies available.

I don't think Summon has a way of knowing what links are broken though? Even though Summon has "Index-Enhanced Direct Linking" which uses information from the publisher for more reliable linking compared to openurl linking it is still not directly checking to see if the article exists. For instance, I notice many of these partnerships seem to be using doi, and believe it or not, dois occasionally still do not resolve properly.

The other thing that people like to moan about is the relevancy ranking. Why isn't Summon as good? Don't get me wrong Summon's is very good, but I doubt anyone would say it's better than Google's and I would guess many if not most would say it isn't as good. I also have anecdotal information in the sense that so far the dedicated google scholar users I know of have not switched to Summon, though they acknowledged Summon is a very good effort, signalling that at the very least Summon isn't much better to be worth switching.

Google has a very sophisticated ranking system of course, they can rank based on social signals, usage, tracking click data etc, which leads to fears of filter bubbles where you get totally different results depending on who you are, when you search, where you are when searching etc..

In any case, I don't believe Summon currently uses any of this, though I would love to see Summon take into account click data usage etc whether on a institutional level or global level if hasn't already, similar to how Summon generates "related search" suggestions.

Summon related searches


But the better relevancy also stems from the fact that because Google directly crawls each page, they can study the linkage patterns between webpages leading to the famous Page Rank algorithm.  As you know, each inbound link is a "vote of approval" from the source page that the destination page is important. While this factor may not be as dominant a factor as it used to be, with other "signals", it's easy to believe it is still very useful for Google.

There's a beautiful explanation here.

"The Web is a complex network of interlinked documents and files. It's vast. It's open. Although much of its data is not very well-structured, it does at least share a common structure (HTML, XML) and a common infrastructure. You can write a program that crawls from document to document on the Web and automatically gleans lots of contextual information based on what links to what, the text in which the link is embedded, and lots of other contextual clues. The contextual data might not be 100% accurate, but it's incredibly rich."

Then it goes on to explain why library data is different.

"Library data, on the other hand, consists mostly of various separate pools of records/resources that, 1. have little (if any) contextual data, 2. are not linked together in any meaningful way (not universally and not with unambiguous, machine-readable links), 3. do not share a common structure, 4. do not share a common infrastructure, and 5. are generally not freely/openly available. So much of what Google has leveraged to make Web search work well is simply not part of library data. "

This is for Google, but applies for Google Scholar as well I would guess to a lesser degree.

For Summon the closest equivalent to that which we have is using citation data from Web of Science/Scopus. I have no information, how this is used, but regardless given that most articles are not even cited once (at least as seen in Citation indexes from Scopus or Web of Science), this citation web is a very poor substitute to the link analysis Google uses.

I would add it's well known Google Scholar generally shows more cites than Web of Science for the same article, due to the "looseness" of what is considere a cite, so this technique of weighting results based on cites is far more effective for them.

Can Summon further improve the relevancy ranking? Yes. For example , Google is famous for personalizing search results using either the fact you are logged into Google accounts or because there is a long term non-expiring cookie as well as hundreds of other cues including social media related ones.

Google Scholar as far as I tell isn't that personalised based on doing the same search on different systems and ips but that's besides the point.

Could Summon do personalised results? In theory it could take into account logged-in users , what discipline they are in, what level of study etc, similar to what Primo's Scholarrank claims to do.

But this would still lack the link analysis Google can do by studying the web as a graph of inter-related articles.

One wonders if adding data from citation managers like Citeulike and Mendeley could help improve relevancy ranking, though of course if altmetrics takes off (in many ways this would be the "social signals" of scholarly works ), Summon could exploit that as well.

Beyond that, I am not sure what the solution is for better relevancy, perhaps moving towards a "linked resource discovery environment" (a concept I don't fully grasp) would help but that would be a fundamental change compared to the shift towards web scale discovery services, but as more and more content gets sucked into Summon and it's competitors , this problem of relevancy ranking is not going to get better.

Conclusion

This post is just my education guess on how Summon and Google work and I might be totally wrong. If you have more knowledge and are aware of errors, please help share what you know in the comments.

Monday, April 1, 2013

More good library related video that spoofs movies or tv

Some of my most popular blog posts in 2010 include 12 good library videos that spoofs movies or tv and Funniest library related movies made using Xtranormal.

It has been almost 3 years since then, and libraries have been hard at working creating more interesting yet professional videos. These are some of my favourites including some I missed the last time around.

1. The Research Games - by Texas A&M University Libraries 




Everyone loves a good spoof, this one by Texas A&M University Libraries  is a high quality movie spoof of The Hunger Games. The theme fits beautifully, with librarians talking the roles of mentors/ex-victors, giving advice.

It's a very high quality production, if there is any weakness it is that if you haven't read the book or watched the movie (which I hadn't at the time this video was released), you may not catch all the references.

After watching the movie and reading the book, I really appreciated how clever this was.

Don't miss the concluding episode here.



2. Research Rescue  - by The Harold B. Lee Library Multimedia Unit 




In our original 12 good library videos that spoofs movies or tv, we included at #8 a "Cops" like spoof. But this one by The Harold B. Lee Library Multimedia Unit  looks even better.

It actually makes a librarian look really cool, I want a "Research Rescue" badge too! Incidentally it made me realise the phrase "Research rescue" is actually used by a few libraries!

Watch Episode 2 "Book Fort" and concluding Episode 3 "And We're Done"


3. BR | Harold B. Lee Library Book Repair by The Harold B. Lee Library Multimedia Unit




Seriously, I could fill the list here with just productions from The Harold B. Lee Library Multimedia Unit . Among some of the ones I liked includes the short but effective videos using unreliable sources like fortune tellers, used car sales persons to drive in the point of using reliably sources. See Library Databases | The Card Reader , Library Databases | The Used Car Salesman and the Library Databases | YouTube Kid

I also liked the warm, moving, THE Library | What Changes Us video as well as the National Treasure like Special Collections | Theatrical Trailer, not to mention the famous Old Spice spoofs

But in the end the one I am going to showcase is BR, book repair , a spoof of ER the TV show opening credits. If you have ever watched the show you will marvel at how good this is. I would add this concept isn't new , see Arlington Heights Memorial Library's Technical Services for a less polished example.


4. The Science Network - A Social Network Parody



Not sure who did this one, but it's a brilliant spoof of the trailer the Social Networkhttp://youtu.be/2RB3edZyeYw but instead of Facebook as the subject it's Pubmed. Arguably Mendeley is a better fit :)


5. Find the Future at the New York Public Library Game Trailer by NYPL



I have written in the past on how adept the New York Public Library is with using Social media, they of course also produce high quality videos. There's The Haunted Library and also NYPL Milstein Suspense Trailer. Still the trailer to the Find the Future NYPL Game Trailer with its X-files type feel is still my favorite by far though NYPL Milstein Suspense Trailer comes close .

6. The Most Interesting Librarian in the World at Library and Information Science grad students at the iSchool of Syracuse University




By a group of library students, this spoofs the by now famous "Most interesting man in the world" ads, and is of course a famous meme. Here's another similar spoof involving a real-life librarian


7. Detection Trailer- Inception Parody/Spoof for Burlingame Public Library

Haven't seen any inception Parodies involving libraries. This was done as promotion for a detective type game for Burlingame Public Library.

8. Victory Lap by The Harold B. Lee Library Multimedia Unit



Okay I couldn't resist, added one last one by The Harold B. Lee Library Multimedia Unit. Entitled victory lap, it's so fun, I couldn't resist including it in.


Honorary Mentions

I have always been impressed by the level of professionalism by the Arizona State University Libraries "The Library Minute" series. Smart, cool and hip.

I know librarians modifying hit songs and doing musical style videos is so overdone (eg Lady Gaga, I will survive, Thriller etc) but I just have a soft spot for Read it Maybe (NYSRA 2012)  

Are there any other library videos you like? Let us know in the comments.

Monday, March 11, 2013

4 ways to bring users to your library resources from Wikipedia

Surveys of both phd students in the UK as well as researchers in US not to mention ordinary users has shown that increasingly, the academic library site is declining in importance as a starting point for searching.

Besides Google, the main site they go to is Wikipedia, either by going there direct or via google because it ranks highly in Google for most topics. There is even a name for it called GWR or Google > Wikipedia > References , the process where people Google, click on a wikipedia result and look at the references.

I won't go through all the debates about Wikipedia by librarians though Wikipedia is not wicked is probably the most spirited defense of the "pro side" of the matter, but suffice to say librarians should look for ways to enable users to somehow get from wikipedia pages to library resources more easily.

But how? What follows are 4 ways I know that allow users to link back to library resources easily, using


1. BookMarklet

2. Libx browser extension

3. Wikipedia "book sources"

4. Wikipedia "Library resource box"

If you have no time, I highly recommend you look at the 4th method. It is a must read.


1. Bookmarklet

I suppose if you read this blog , you know what a bookmarklet is, but in case you don't it's just a simple bookmark, with some javascript that when pressed will carry out a simple action.

Barbara Arnett and Valerie Forrestal way back in 2010 in Bridging the gap from Wikipedia to scholarly sources: a simple library bookmarklet showed us how to create a bookmarklet that did the following when clicked on a wikipedia page.

1. It would take the wikipedia title

2. Throw it into a search (you can edit it first) and that would bring the user to the library's search - in this case Ebsco Discovery Service.

Here's it in action



Obviously it's trivial to change this to Summon or whatever search you want. But that's not all, cleverly they built-in Google analytics, so you can keep track of usage/clicks of the bookmarklet.

A trick they helped me adapt for our highly popular proxy bookmarklet. So now,` I can tell how popular it is.

This is a nifty trick that was adapted by other libraries including MLibrary, there are some doubts about whether people would bother to setup a bookmarklet or remember to use it. But that's the beauty of this bookmarklet, you don't guess, the analytics are there.

I currently don't use this bookmarklet and in the past I would probably say no-one would use bookmarklets but a niche audience, but looking at the heavy usage of our proxy bookmarklet (possibly subject of another future conference so I won't say much except to say it's insanely high),  I wouldn't rule out the possibility of this bookmarklet been used.


2. LibX browser plugin

So maybe bookmarklets are hard to remember but what about browser plugins? Libx by  Annette Bailey and Godmar Back of Virginia Tech  is probably the most famous one of them all.

It's a free service that any library can setup and gives you a host of functionality that makes it easy to go from any webpage to library resources.

Among my favourites are hot-linking of ISBN/ISSN/DOI/PMID (basically it converts such strings to clickable library searches), appending of ezproxy on pages or links, support of COINS, link resolvers, xisbns etc.

The latest version even integrates with Summon so you can mouse over unique identifers and check availability.




Watch the screencast here

In short, it allows users to interact with library resources using multiple methods even if they are not on the library page.

Most of these features work on all web pages and are independent of Wikipedia, but the support of COINS means there is some Wikipedia support. COINS without getting too technical, is a way to markup citation data in html so tools like Libx and Zotero can understand or parse the citation and use it to connect to full-text via your link resolver.

Or rather there *was* support, as of Nov 2012 COINS support was sadly removed 


3. Wikipedia "book sources"

Either method above relies on users on installing something but most users will not. How about something built-in to wikipedia?

There's apparently some feature called "Book sources" in Wikipedia .

It says

"This page links to catalogs of libraries, booksellers, and other book sources where you will be able to search for the book withISBN. If you arrived at this page by clicking an ISBN number link in a Wikipedia page, then the links below (those labeled "find this book") search for the specific book using that ISBN number."



Confused? Here's how it works. Go to say the Wikipedia article Eulerian path

You will see the following




Now click on the isbn and it brings you to the page that says Book sources , this page in particular

If you jump to the section on Singapore you see




Click on it  and you guessed it, a ISBN search in your catalogue. In this case, it works nicely as we have it.





.

I only realised there was such a feature when I was looking at referrers to our catalogue and noticed a fair amount of them from Wikipedia.

Some were ordinary links in the "external links" page but some were isbn links.

This is a nice fairly obscure feature but really isn't very convenient to use if you ask me.


4. Wikipedia "Library resource box"

All the things I mention above are not new. But this last one is new. Rather then explain, let me just show you the Wikipedia article I inserted the library resource box. In this case this is the Wikipedia article Japanese occupation of Singapore .

At the bottom of the page in external links section you see this including the box I added.




It you click on "resources in your library", and this is the first time doing it you will be brought to a library selection page.






Obviously you pick the library you are with, or better yet "set a preferred library for future searches" and when you click it will use the Wikipedia title to do a search in the library you selected.

In this case is our Summon search.

As you can see it's a very nice search result, showing off the strength of our collection including local theses, books etc.




In fact what it uses to search is sometimes much more complicated than just using the Wikipedia title. For example, you can override the search to use a Library of congress heading search instead of the Wikipedia title by adding

|lcheading=xxxxxxx .

You can see this in effect here

Other times it does the closest mapping to a file of LCSH kept in the system etc. It also can use viaf , I believe and if all else fails it just uses the Wikipedia title for a general keyword search. I am not sure if I got the explanation 100% correct, but I think you get the idea.

It also can do something special if the article on Wikipedia is on a person. Below shows the one I inserted on the Wikipedia article Goh Keng Swee . 





You will see because I changed the options there is now a "About Goh Keng Swee" as well as a "By Goh Keng Swee".

If you click on the links below the two, you of course get different results.



The above is a link "about" him. It's a normal keyword search. Note the search uses a LCSH even though the Wikipedia title isn't that exactly and I didn't override the title search manually with a specific LCSH, this is some automapping mechanism I think.

The one below shows the results after clicking "by" him which obviously does a author search.




Personally I think this is a wonderful idea. The author of this system, John Mark Ockerbloom in my opinion has hit on a great idea. You can see the blog post where he sets out the idea here. Specific instructions on adding the library resource box are here.

But how do you get your library into a list of libraries that appears when you click on the link? You simply request it and John will add it. He has been very quick to add libraries (most libraries use standard systems, eg, we use Summon which is used by over 500 libraries) and has also kindly and patiently answered all my questions about this great idea.

The great thing about this is that once I add the box in an article, all libraries in the system benefit. Right now, we are the only Singapore Library available in the list but it's trivial for John to add other Singapore libraries such as the National Library Board's etc and we all benefit!

I've added a little under 100 articles on mostly Singapore topics. I was cautious as well checking if the search would give reasonable results. Part of my strategy has been to look at the most common searches in Summon, Google the same keywords see which Wikipedia articles appear and add the library resource box on those articles. 

Of course this resource box compliments the strategy of inserting direct links from open access/free resources or libguides into wikipedia and using the google site operator I can see this is a pretty popular strategy by some libraries. But in some cases you might have not really have something unique to link to, or many have lots of interesting items and is too much effort to include them properly in the article.

I intend to add more as time permits and obviously I am studying our summon logs to see how much traffic is driven to Summon this way (hopefully not too much by this blog post). The skeptic in me wonders if people will click on such links, as it is usually placed in the last section? Or even in short articles, would they want to click to search the library? 

Only time will tell.

Note: I just finished blogging this, and noticed that the comments to the blog post pretty much encapsulate this blog post including the first 3 ideas , but I hope this was still useful.










Share this!

Related Posts Plugin for WordPress, Blogger...