Archive for the ‘Online Information 2011’ Category
Online Information 2012 will be at a completely new venue, the ICC ExCel Centre.


ICC Aerial View
The International Convention Centre (ICC) is a new venue in East London and is Europe’s largest convention center. It will provide comfortable meeting space and much better connection between the conference and the exhibit hall than was possible at the Olympia. The ICC is conveniently located only 1 mile from the London City Airport, which serves nearby areas in the UK and other parts of Europe.

Besides a new venue, the organizers of Online Information have promised other new features and an updated program.
I hope to see you there.
Don Hawkins
Columnist, Information Today and Conference Circuit Blog Editor
This session featured 3 search experts reviewing current trends and developments. Marydee Ojala, Editor, ONLINE Magazine and long-time online searcher, led off with a presentation entitled “So Many Search Engines, So Little Time”. Of course, the most popular search engine is still Google, but its relevancy is declining, there is no commitment to advanced search options, and it seems to be pulling back from features admired by information professionals. Alternatives to Google are:
- General web search engines. Bing by Microsoft is the most familiar. It features field searching and, search refinement (i.e. advanced search). Yahoo’s search is powered by Bing except in Japan and South Korea, and it remains a takeover target.
- Specialty search engines concentrate on format (images, video, social media), or subject (news, science, business). A variety of country search engines are available, such as Baidu (China), Yandex (Russia), and Naver (South Korea). The Search Engine Colossus is an international directory of search engines. Blekko has no spam and filters out results from content farms. DuckDuckGo is known for its privacy because it does not save searches. Exalead is a cloud-based site for enterprise search and has some advanced features such as soundslike and spellslike. Topsy is now the only search engine for archival Tweets.
Many search engines feature databases of a variety of information types; for example, one can find databases of images, books, news, and maps on Google; images and finance on Yahoo; and travel, news, inages, and video on Bing. Flickr and Picasa are well-known image databases, which can be searched by image criteria such as color. YouTube, of course, is the leading video search engine, but one can also find instructional videos from various universities as well as those from the Journal of Visual Experiments (JOVE).
- Paid search engines are mainly the traditional ones such as Dialog, Factiva, LexisNexis, EBSCO, and ProQuest. Some subject-oriented paid search engines are also available such as those from STN International, whose flagship database is Chemical Abstracts. In contrast to Google and some other web search engines, no SEO manipulation is done by these vendors, so results are very consistent.
Innovations in search continue, but it is happening at the margins and inside the enterprise. Search algorithms are changed frequently. (See the closing keynote session for a discussion of the future of search.) Information professionals must constantly keep up with changes in search engines and be ready to switch search tools quickly. This is time consuming, but it is necessary if we are to remain relevant.
Marydee closed by urging attendees to read The Filter Bubble.
Arthur Weiss, Managing Director, AWARE, continued Marydee’s theme and reviewed some specialist search engines for people, numeric data, and news. He noted that although search engines may claim to search the deep web, they may be only using a web crawler to find material on the visible web. True deep web search tools typically look for information not searchable by crawlers.
Weiss showed how a Google news search returns different results depending on whether one is logged in to a Google account or not. When you are logged in to your account, Google knows who you are, your location, and any preferences you have set. Several news search engines cater to business users, including Northern Light, Congoo, and Newsnow. Silobreaker and Evri aggregate news and return results on a topic. Silobreaker has a number of innovative features, such as a summary, headlines, and trend charts showing item frequencies. Evri has more images than Silobreaker.
People search engines are either directories of names or searches for names in the context of articles. Some of the second type include Pipl, 123People, and Yasni. Pipl has a US bias; the other two are based in Europe. Yatedo allows phonetic searches, searches based on links to other people, and other advanced options. Jigsaw is a database of online business cards and actively solitics contributions of them. Yoname searches people who are users of any of 27 social media sites.
Numeric searches can be difficult because much numeric is presented in graphical format. Data from official statistical sources is available in the Offstats database, and the Open Data Directory provides links to over 400,000 databases of numeric data on a wide range of subjects. For scientific data, Wolfram Alpha is a good source; it presents data in tabular or graphical format. Lexxe searches data by using a “semantic key” approach and also reports results in a chart.
Karen Blakeman, Trainer and Consultant, RBA Information Systems, looked at what search engines know about us, and “a lot” is known, so users must be well aware of this when as they do their searches. In particular, Google knows us very well and personalizes search results based on the user’s location browser, search history, blocked sites, “liked” sites, etc. Searches based on the user’s location attempt to return rresults relevant to the country, but they may return erroneous results because a company’s switchboard may be located in a different country, for example, which has implications because access to some sites is blocked outside their local region.
Panopticlick will test your browser configuration and report how unique it appears to be. (The more unique it is, the easier it is to track unique information about the user.)
Search personalization and localization may not be all bad for users; for example, it is useful if you need to quickly find a local restaurant or are researching comapnies in a particular country. To explicitly search local listings, country versions of search engines are useful. Several browsers have an anonymous searching feature that turns off saving of searches, personalization, etc. You can also set your ad preferences in Google (www.google.com/ads/preferences).
Facebook is notorious for making it difficult to delete material, and it even keeps it even when you think you have deleted it. Europe v. Facebook is a collection of complaints against Facebook and instructions for residents of Europe to request their data from Facebook under EU privacy laws.
In the news area, Google can seriously damage search results. Mary Ellen Bates recently did an experiment where she asked several searchers to enter the term “Israel” and send her the results. The results were startling: More than 25% of the stories were retrieved by only one searcher, and only 12% of the searchers saw the same 3 stories in the same order in their results. Google’s recently introduced “Standout” feature to tag content will make the situation worse.
So what should a searcher do? You can reject cookies, but then many searches will not run. Active management of cookies is possible, but it is time consuming. Scroogle.org provides an anonymized interface to Google, but it is for web search only. Duck Duck Go and Blekko do not keep web history of personalize search results.
Here are Karen’s recommendations in this uncertain and sometimes scary search world.
- You have some control over personalization, so damage limitation is sometimes possible.
- Sometimes a web search history is a convenience and personalization is a good thing. You must make this decision.
- If you have a Google or Bing account, be sure to log out of it when not using it.
- Regularly check your dashboard privacy settings, and ad preferences.
- Clear histories if you do not need them.
- Remember that if you delete all cookies, you will lose your opt-out preferences.
Don Hawkins
Columnist, Information Today and Conference Circuit Blog Editor
Steve Arnold moderated his traditional closing panel discussing the future of searching. The format was the same as in the past: A series of questions from Steve with answers from the panelists who were:
- Gregory Grefenstette, Chief Science Officer, Exalead
- David Milward, Chief Technology Officer, Linguamatics
- Dave Patterson, CEO, Sophia

(L-R) Gregory Grefenstette, Steve Arnold (Moderator), David Milward, Dave Patterson
Steve introduced the panel by noting that the participants all represent successful companies gaining in sales. 2012 will be a very challenging year for all companies; the changes in technology that are now underway are as dramatic as we have ever seen. Social change in findability is uncovering a growing interest in using people to find answers.
Below is an edited transcript of the conversation.
1. What is the major trend in enterprise search and content processing for 2012?
Greg: Enterprise search is different from web search because lots of information is in unstructured sources. The current trend is to include both internal and external information in searches, so there is a growing body of linked data available. People want to know everything about what is going on both inside and outside the enterprise.
David: Content is getting more diverse. The number of people available to sift through the data is getting smaller, so text mining and other technologies are necessary to analyze the data, and they are being used in a hidden way behind the scenes. With more automation, keeping things up to date is important. Text mining provides interactive information and can be used like a search tool, so we can combine it with search.
Dave: Our focus is on content, understanding its meaning, and letting organizations leverage more value from their content. Discovery is very important. People want to find out what they do not already know about and link it with information they have. Cloud-based computing will be prominent for the next few years.
Steve: The focus is dramatically different from basic retrieval we had in the last few years. We have heard that Microsoft has made its search system available without charge as part of a bundle to large corporations, so in effect, search is free. Lucid Imagination is creating a free search system, and pricing pressure is increasing to where basic search has become a commodity.
2. In an environment of low cost systems, what is the value of a commercial system that costs much more?
David: We do not compete directly against search tools. Text mining finds indirect relationships, which you cannot do with a search tool. Information professionals want to use text mining to give added value over what end users would get. Terminologies are difficult and costly to create. Text mining tools are used with the technology.
Dave: Many free search tools are limited in functionality. Just finding some hits is no longer sufficient. If all you want is a flat list of documents, then the free tools are fine, but most enterprises want search tools that understand the content and leverage its value. Tools at the low end of the market will not return that value. Many organizations do not appreciate the time it takes to build something in-house using open source tools.
Greg: It is nice to have free search, and the tools are good at asking a question and getting an answer. We build applications where search is in the background and can connect to many resources. Another added value is the semantic processing we provide which is not available in the free tools. Modifications and fixes are either not available for the free tools or cannot be done quickly.
3. What is the impact of apps on search and findability and content processing?
Dave: We need to be more innovative in the way we present information to users. We cannot present results lists on the small screens of mobile devices, which puts more pressure on the intelligence of the search tool.
David: As you get to smaller devices, you need more understanding of the text and whether you can repurpose it differently. Faceted search and text mining allow us to structure the information and allow people to navigate through it. We may see more emphasis on push services that give people information wherever they are.
Greg: Users’ expectations are being raised; they want instantaneous response time, ease of use without training, and 24 hour availability. This is a challenge and Exalead has a solution. Mashup systems allow you to present information to different devices.
4. How is cloud reliability going to be addressed?
Greg: Search engine technology allows you to have constant availability.
David: We initially thought a cloud service would be interesting to small companies, but we found after launch that big companies were also interested. Peiople want to concentrate on core competencies. They do not want to worry about keeping external services up to date and want someone else to do that for them.
Dave: There are issues of security and reliability. Are there any statistics that in-house servers are more reliable than cloud ones? Are we panicking as a result of the massive failure of BlackBerries on October 10-12? (Google “Blackberry outage” for more information.) Security is also an issue. Companies protect their data as much as possible and are reluctant to put their data on the cloud. Employee behavior is a bigger risk to companies; laptops are frequently lost.
5. In the search and business intelligence space, former search companies are are repositioning themselves and are now saying they offering predictive analytics or customer service. To an old person, this is search. Are these new terms helping or hurting?
Greg: Search-based applications can be very varied. There are many semantic technologies in search engines that allow these different applications.
David: It is always good to describe what you are doing to solve a problem instead of calling it just search. There is lots of subtility in language.
Dave: We are firmly focused on content and understand its meaning as well as new and better ways of helping people deliver it. Semantics is an example of true business intelligence. We need natural language processing to understand what we are doing. There are many ways of extracting meaning from unstructured business information.
This was an excellent concluding note for the conference on. The old standby, search, continues to be change as technology advances.
Don Hawkins
Columnist, Information Today and Conference Circuit Blog Editor
So popular and widespread have e-books become that it would be rare to find a major information industry conference these days that lacked at least one session on them. I attended the “E-Books Unleashed” session which had talks by the presenters shown below that highlighted some recent e-book developments.

(L-R) Anders Mildner, James McFarlane, Giulio Blasi, Mary Joan Crowley
John Akeroyd, Director, Information Reports and Honorary Research Fellow at the University College London and session moderator, introduced the session with the following sales data that shows that the recent explosion in e-book sales continues unabated.


Anders Mildner, a journalist from Sweden, keynoted the session with a presentation on “E-Books, Reading and Culture: What Change Can We Expect?”
People have been reading aloud for hundreds of years, and listening has become a shared experience and a part of culture. The public became used to being passive, but 10 years ago, the emergence of social media changed this when people began to create their own media.
Social media deals with social objects,and when they emerge, we see a shift in value. For example, music only has a value when it is being shared and with the rise of music downloading and sharing services, we see many music stores closing down.
E-books will create a shift from a passive listening culture to one of participatory reading, so reading will become a shared experience. Books are entering the world of remixable objects, where they can be cut, pasted, and shared. This is creating a power shift of those who hold the power and an economic threat to the producers. The value of the printed book in economic and cultural terms will decrease as we become surrounded by digital books.

We saw a similar process in the music industry. People care more than ever about music but less about the medium on which it is delivered. Libraries are now making deals with publishers to be able to lend e-books, increasing the value of the book as a social object.
Libraries and librarians are facing an entirely new challenge, but we should be grateful. For the first time, we can re-define reading and do it together. The promise of the future is that we are able to engage in reading more deeply than ever before.
James McFarlane, CEO of Easypress Technologies, began his presentation, “Beyond the EPUB3 e-Book”, with a brief historical overview. The first serious attempt to produce e-books was on the Apple Newton in 1994. Only 2 books were available: the Bible and the Concise Encyclopedia Britannica. The Newton was backlit and had a short battery life. It did not succeed because there were better devices available. Then Jeff Bezos developed the e-Ink device in 2004, and the Kindle was born. Bezos said he would have “every book ever printed in every language available in less than 60 seconds”. Then came Apple’s iPad which has transformed the e-book and multimedia worlds. We are now seeing many competing tablets appearing frequently. Harry Potter books will soon be available as e-Books, and 100 million downloads are expected. (There will be 7 books in 68 languages with video clips, audio, games, and other related products). This will transform the market yet again.
Easypress has developed a way to convert files from Quark to e-books in a couple of minutes. That is epub2, which is available today. The goal of epub3 is to develop interactive books that go beyond the reading experience. The first EPub3 reader will not be on the market until next year, and it will have many new features, including indexing, searching and navigating, video and audio, multi-column formats, and active hyperlinking.

A significant problem for e-books is indexing. 83% of print books have an index, but virtually no e-books do because the concept of a page does not exist in an electronic publication. So the index must be redesigned. Searching and navigating e-books is also problematical. In a future design, e-book indexes will have active links so that when you click, you jump to the most relevant section of the book, with the selected term highlighted. Then you can click on a search bar to get to related subjects. Today, we have only simple character-based searching which is not very useful.
Many people just want to find something to read. There are thousands of sites to help readers discover e-books. The next generation of iPads will permit opening several books simultaneously, so readers will be able to jump back and forth between them.
There are about 1.5 million e-books available today. Searching, navigating, and discovery will be highly necessary as that number grows. Keyword navigation, semantic referencing, and sentiment analysis will be used to help us move from simply finding some facts to discovery. Epub3 will allow us to move beyond the book experience in ways we have yet to imagine.
In his talk on “Digital Lending Models for Public Libraries”, Giulio Blasi, CEO of Horizons Unlimited, described the Media Library Online (MLOL), a digital lending aggregator serving 2,300 public libraries in Italy with open access and print contents and including several different lending models.

Libraries are currently giving away music legally; is this possible with eBooks? “Social DRM” allows a single user to download the book and keep it forever.
The MLOL offers 4 major services:
- Shop: libraries manage their collections and get e-book backfiles,
- Customized library portal: a unified interface for searching, browsing, discovering,
- API: Integrates with OPACs, external authentication systems, and includes next-generation features allowing embedding the contents of libraries wherever they want from the web, and
- Cooperation: creation of consortia around any content collection. Service at national level allows any library to cooperate with any others.
Mary Joan Crowley, Librarian, Sapienza University of Rome, noted that for 600 years, the library has guaranteed the organized delivery of printed and other content, provided trained staff to support teaching and research needs of universities, and ensured that the necessary physical facilities were available. But then that all began to change. Rising costs, networked environments that the library could not compete with, declining usage (people never start their searches from the library building and frequently not from the library website), and generational shifts (students want information immediately) have caused a significant change. 65% of a library’s budget typically goes for journals, squeezing out books. How can we compete with huge information providers?
We have a great brand and must find out how to evolve it into the 21st century. We may not be courageous enough to make the shift. Even though we have a service ethic, good reputation, and the trust of our community that has been built up over centuries, people will not come to the library any more. We need to manage our resistance to change and direct it towards the critical application of new skills. One way to do this is to engage our users by going where they are, joining their conversation and figure out how to find the information they want. We must be content providers, not just suppliers.
The academic library is at the heart of the modern university, providing access, support, and services to its users. An e-book reader project with lesson-based content was started to serve customers not physically at the library. Although the e-book market is in its infancy, users are not; they are accustomed to downloading lots of content, content is evolving, and they want untrammeled access to all types of media. The reader project was successful; users became part of the conversation. As the library provided adapts to the changes in the value chain, it can provide incremental value in the ways shown here.

Crowley concluded her talk by showing the following adaption of Ranganathan’s laws.

Don Hawkins
Columnist, Information Today and Conference Circuit Blog Editor
As is common in many conferences these days, Twitter was widely used. Each conference room had its own Twitter hash tag, which allowed a “Twitter Moderator” to relay questions from the Twitter feed to the speakers. Large monitors in the hallways of the conference center displayed the feeds for easy reading.
At the e-books session on Thursday morning, I found myself sitting behind two Tweeters.

They turned out to be Stephanie Kenna (at left below) and Hazel Hall (right), from the Library and Information Science Research Coalition. Stephanie was a Twitter Moderator for the session.

Stephanie Kenna (L) and Hazel Hall (R)
You will probably not be surprised to learn that many interesting opinions surfaced in the Twitter streams.
Don Hawkins
Columnist, Information Today and Conference Circuit Blog Editor
