Archive for the ‘SSP 2011’ Category

David Smith, Moderator (L), and John Palfrey

The closing plenary was presented by John Palfrey, Professor, Harvard Law School, a faculty director of the Berkman Center for Internet & Society, and a co-author of Born Digital. Palfrey is also Vice Dean for Library and Information Resources at Harvard Law School, so he focused much of his presentation on 4 broad issues currently affecting libraries and librarians. He noted that although there is a common cause between scholars, teachers, publishers, and librarians, we all face perils in separate ways.  Why do we still need libraries, publishers, etc.?  What is our role in learning space?

1. Changing patterns of learning

Youth and media are both born digital, which is not difficult to observe.

Kids have great digital skills, and there are different practices in information access.  But it is not just kids who are learning in different way–everyone has a smart phone, Blackberry, etc. , and they are widely used.  By 2012, we will be more likely to access the web on a mobile device than a PC, which will not be a distraction, but interaction and multitasking.  Even now, many of us do a variety of things while we learn.

The media we are interacting with are digital, whether they are images (Flickr), audio/video (YouTube), or print (Google).  Only when reading a monograph, do students customarily use printed materials, and the reasons they give are Bed, Bath, and Beach.  In these circumstances, they prefer print by an 80/20 margin.

Issues:

Credibility.  There is a lot of misinformation on the web, as well as other hidden influences.  Almost all students will look for information first on Google, and then on Wikipedia.  Some cut and paste information from the Web into their papers; others don’t trust anything online.  Almost none of them go to the history or discussion pages of Wikipedia to determine the quality of the information, but most of them will go to the bottom of the Wikipedia page and click on the source links.

Overload.  There is too much information.  A major challenge is how to make more use of time when we are connected.

2.  Innovative teaching

How do we harness what is great about the digital era?  What will connected learning look like?  Being in the mode of a creator is very important.  The creators of today’s information are also creators of the code for such services as Facebook, Google, etc.  They were students when they started creating their services.

3.  Changing patterns of research and publishing

Open access is a major innovation in digital scholarship.  How do we make more of our libraries (Harvard has 73 of them)?  It is not about budget cutting.  How do we learn from our peers?  What can we do to support digital scholarship? Harvard Law School has committed to open access in faculty publications,

Harvard Law School Faculty Policy on Open Access

and is facilitating open access for student publications as well.

Open Access to Student Writings

A  fund to pay the fees if necessary has been created .

How is our work related to mass digitization projects?  Harvard participates in the Google book scanning project and has launched a Digital Public Library of America project.  The challenge is whether we can create a free library on the scale of a large academic library and still be consistent with copyright, etc.?  This will be a useful project for the many people that are involved in text mining of these huge databases.

4.  Changing Roles for libraries and librarians

Even the richest schools do not have increasing library budgets. The best case today is that they will be flat.  We are asked to get more materials from more places.  How do we make a future for ourselves?  We are not doing enough to connect our users to all types of content, especially digital materials.  How do we think about space in a way that connects the physical and digital?  Only 1% of the employees of the Harvard Library are devoted to managing the library’s website, even though half of the library’s traffic comes through it.

How do we architect classrooms for the digital age?  We need to think about how the information architecture relates to the physical architecture.  How do we think about sharing our collections differently?  No great library can go it alone in today’s environment of non-growing budgets.  We must be more precise with our acquisition policies and determine what we have that no other library does and which we therefore have an obligation to collect.

Should we be in the business of creating better and different interfaces for people to access repositories?  We need to be aggressively in the business of creating more content online.  People are worried about losing the idea of serendipity, so we need to present information in ways that would enhance serendipity.  We can create interfaces that will allow people to interact with information in ways that they cannot physically.  Circulation data can be used to see which materials are most popular and can influence how they are “arranged” on a digital shelf, which will help people access information better.

We need to pay attention to new technologies and be in the business of adopting and shaping them.  They may well be disruptive.

Opportunities

Look at where the problems lie and turn the challenges into opportunities.  Opportunities lie in the areas of information creation, participation, and empowering individuals.  We need to get in the business of realizing that we have a role to play in recreating our institutions.  We are in a digital-plus era which is having a profound transition in every field.  It comes back to our mission as teachers, librarians, and publishers, which are the same, even though we are all being disrupted in different ways.

Here are Palfrey’s final conclusions.

Don Hawkins
Columnist, Information Today and Conference Circuit Blog Editor

Trust Panel: (L-R) Carol Anne Meyer, Howard Ratner, Jan Brase

Cross Mark Initiatives:  Why a Monkey Matters
Carol Anne Meyer, Director of Business Development and Marketing, CrossRef

What do monkeys have to do with publishing?  Well, of course there’s the “infinite monkey theorem” about an infinite number of monkeys typing for an infinite time and eventually producing something sensible, like a work of Shakespeare!  (Even with just 50 monkeys, the probability of producing just a single word, like “Hamlet” has been estimated at approximately 1 in 15 billion.)  But I digress:  Carol Anne Meyer explained that monkeys matter in scholarly publishing because a scholarly paper on monkey behavior was retracted because of misconduct by the author.  So the question of trust is very important.  Although a blog, Retraction Watch, tracks retractions, websites do not handle them consistently, with the result that readers may never know that an article has been retracted.  Science Direct adds “RETRACTED” to the titles of such articles, but some websites do not offer any type of indication.  And what about e-books or results from federated search systems?

Meyer pointed out that many things may happen to an article after it is published.  Here are some of them:

Documents on the web are living and can be easily changed.  When content changes, readers need be aware of it.  Which version is the version of record?  Most reputable publishers are trying to communicate this information, some better than others. Here is a record of an article from Science, showing a link to a correction.

Link to an article correction

CrossRef has attemped to solve this problem by developing CrossMark, a logo that can be attached to a paper indicating that updates exist.  When the user clicks on the logo, a popup window opens indicating that updates are available.

CrossMark logo and popup window

This logo can be applied to PDFs or other web documents, providing a way for the publisher to list the Publication Record information, such as funding agencies, publication history, plagiarism screening, license types, etc.  The logo could even be displayed in Google search results, indicating the version of record.  It is important to note that CrossMark is not a DRM system.  A pilot test of CrossMark is underway now, with a 3rd quarter launch envisioned.

ORCID: An Open Registry of Scholarly IDs
Howard Ratner, Chairman, ORCID, Inc. and CTO, Nature Publishing Group

Researchers care about their identity when they join a faculty, apply for a grant, or submit a manuscript to a publisher.  ORCID (Open Researcher & contributor ID Project) supports a record of scholarly community by creating a reliable identifier record for authors.  It was started in December 2009, with launch planned for early 2012.

ORCID is a non-profit consortium of 230+ participants, with the largest group being international universities and societies.  It is open to any organization with an interest in scholarly communication.  There are many identifier silos; ORCID hopes to bridge them.  ORCID’s mission is to create a permanent, clear, and unambiguous record of scholarly communication by enabling reliable attribution of authors and contributors.  All software developed by ORCID will be released under an Open Source Software license, and fees collected will be used to ensure the longevity of ORCID.

ORCID’s first efforts will be disambiguation of author names using “trusted linking partners” (TLPs) to create a relationship with self-asserted systems.  Input of records is very easy; building the author record is key.  Knowing publications of an author is important.  ORCID/DOI pairs will be sent to publishers during the article creation process.

DataCite
Jan Brase, DataCite

The concept behind DataCite is that data should be citable just like articles to give it higher visability, easy re-use and verification, enhance its reputation for the collection and documentation (for example, in a citation index), and to avoid duplications.  To accomplish its mission, DataCite assigns DOI names, which scientists know how to use, to data sets, thus linking them to the supporting scientific article.  It is a global consortium of local institutions and is hosted and managed by TIB, the German National Library of Science & Technology.  Most members of DataCite are libraries because they are trustworthy organizations for scientists.  Here are the 3 main goals of DataCite.

DataCite goals

DataCite has registered over 1 million datasets so far and has published a metadata schema for all of its members.  This metadata will shortly be uploaded into the Web of Science and other indexes.  In this way, DataCite supports researchers, data centers, and publishers.

Don Hawkins
Columnist, Information Today and Conference Circuit Blog Editor

 

Educational content is being transformed.  Laura Fleming, an educational consultant and librarian, discussed Inanimate Alice, one of the more innovative current projects.  Inanimate Alice is a born-digital, multimedia, and interactive story for young children.  It is interactive because it requires user interaction to move the story forward.  As the story unfolds, the episodes become more complex, and the level of interactivity increases, holding the child’s interest.  The system has several unique features: it is multimedia and can be viewed in several languages and it is an example of “transmedia” because it can be viewed on any device that is capable of running Adobe’s Flash player.  According to the Alice website:

“ ‘Alice’ connects technologies, languages, cultures, generations and curricula within a sweeping narrative accessible by all. As Alice’s journey progresses, new storylines appear elsewhere providing more details and insights, enriching the tale through surprising developments. Students are encouraged to co-create developing episodes of their own, either filling in the gaps or developing new strands…children will grow up with Alice, from class-to-class from year-to-year, engaged with an ever-growing story in which they become part of the narrative.”

A downloadable education pack is available for teachers and educators to accompany the story.

Inanimate Alice is a a unique and different example of how education and e-books will advance in the future.

Don Hawkins
Columnist, Information Today and Conference Circuit Blog Editor

E-books have become widely accepted, but many users will not be satisfied for long with static e-books that simply recreate the print book experience online.  Print reference is still selling, but new technologies are having a significant influence on e-books.  Rolf Janke, Vice President and Publisher for Sage Reference, a division of SAGE Publications, said that librarians have a love-hate relationship with e-books.  Although students expect all the features, librarians are nervous about the associated costs.

Reference used to mean print, static, and black and white content.  Today it is online (meaning e-books), on a platform, and e-book aggregators are beginning to appear.  There are some dynamic e-books, but many products are still static.  Some e-books have color.  Simultaneous usages has completely transformed the world.  Interactivity exists, but it generally consists of videos and podcasts, which is not defined as “animated”.  And the next generation will be heavily mobile.

Reference is going digital.  The basic tools provide a starting point, with interactive features add value to the user experience.  Adding interactivity is desired, but it involves costs.  Can publishers assume we will have a ROI?

A survey of librarians (photos) revealed some surprising opinions on what is valuable in reference services:  desirable features included cross-searching all content on a single platform, “Did you mean?” for spelling corrections, citation builders, and videos.  Features not seen as valuable were saving searches, editing content, linking content to social networks, and animation.  Video is the most prominently used technology in reference sources. It must be built into an article to create value, and it must provide transcripts.  SAGE released its first multimedia product in January and quickly observed that articles with video have been used more than any others in the entire SAGE collection.

Here is SAGE’s view of the e-book market.

Don Hawkins
Columnist, Information Today and Conference Circuit Blog Editor

 

Web Analytics Panel: (L-R) Melissa Blaney, David Smith (Moderator), Mike Sweet, Mark Johns, Jake Zarnegar

When I saw the title of this panel, I wondered if it should be called “The Big Brother Panel”.  Web analytics provide lots of data on who is visiting your site, and in turn that allows you to develop strategies for understanding your business.

Melissa Blaney, Manager of Platform Analytics and Communications at the American Chemical Society (ACS), led off with a description of some of the common tools that the ACS.  These include Atypon’s Literatum, which provides COUNTER reports, identity and content reports, and advertising statistics; Google Analytics, which is used to track usage of ACS online journals and community sites, and Omniture SiteCatalyst, which is used for tracking accesses to ACS’s weekly news magazine, Chemical and Engineering News.

Analytics are used by many internal stakeholders in an organization:  Executives, advertising, sales, web strategy and innovation, marketing, editorial , sales analysis and support.  The ACS has been providing COUNTER reports to stakeholders since 2002, and among other uses, one of the factors in determining renewal prices is online usage.  Platform enhancements and web development initiatives are also influenced by the data.  Beyond COUNTER other statistics can measure key performance indicators (KPIs) for the business, such as referrals, searches, geography, unique registrants, etc.  Useful information can be obtained from data on usage in various time periods, such as seasonality.

Here are some useful data that can be obtained from search analytics:


Data on referrals shows how people get to the site and where they come from.  Although Google may drive 90% of the traffic to site, discovery tool use may be more valuable.  In general, browsing seems to be decreasing, which indicates users are finding information less frequently by serendipity.  Much web traffic is international, so geographic data are important.  And world events can influence traffic; for example, when the Olympics were in China, there was a big decrease in traffic, and when storms keep people at home, traffic from that area decreases.  Tracking social media is often a challenge.  You must know your audience, define expectations, and document and categorize what works.

Using analytic data, future developments can be planned.  For example, when the ACS wanted to develop products for mobile platforms, the following data were used to prioritize which platforms to develop for first.

In conclusion, it is important to recognize that web analytics are only one source of data.  Others include focus groups, user testing, customer surveys, and direct feedback from sales teams.

Mike Sweet, CEO of Credo Reference titled his talk “Web Analytics:  Pragmatics Rule–”It’s the People”.  Many analytics tools give you a lot of data, but what does it mean?  You need analysts to interpret the data and then figure out how to implement the results. If you try to focus on all the data that’s available, you won’t achieve anything.

Credo’s primary platform goal is to increase traffic.  They began by choosing a web analytics package that provided a number of techniques:  on-site web analytics, usability testing, focus groups, market scanning, but soon learned that Google Analytics was the best for their purpose.
Choose wisely:  They chose the wrong package initially but found it not useful, so they switched to Google Analytics.  Focusing on collecting lots of data (especially from log files) and generating reports may obscure what is best for the business.  It is best to define what you are trying to do on the site and if you were able to do it.  From this insight, you can make all the platform improvements you need to make.  Prompting users to give you information on why something isn’t helpful is very simple to do and tells you a lot.  Here are Sweet’s lessons learned:

  1. Assess where you are on the evolutionary curve.
  2. Choose packages and data mining projects carefully.
  3. Don’t plan to rely solely on on-site web analytics data.  Mix on-site and off-site data to get a complete picture.
  4. Assess your teams’ agility and your platform’s extensibility.  Only gather insights into things you are actually ready to act on.

Conclusions to ponder

  1. Don’t bite off more than you can chew.
  2. Numbers aren’t customers.  Use a larger approach to improvements.
  3. The experimenter’s mindset is a key–get started and have fun!

Mark Johns, Manager, Publication Management Group at HighWire Press noted that robots are widely used to crawl sites and obtain usage data for them.  ”Good” robots are good for business, so having your websites exposed to them is extremely critical.    The “not so good” robots are the overzealous up-time tracker that hits the website frequently and causes problems for it, or malicious ones crawling the web.

The web is becoming more personalized and is molding itself to users in real time.  Publishers cannot get direct access to information users because of the institutional purchasing model.  It is time to start thinking about individual users.  One example of this is the BBC, which lets users rearrange their home page to their liking (except for the ads).  We know a lot about subscribers so we can target things to them.  But we also have data about anonymous users, such as their IP address, search terms, geographical location, language settings, content viewed in a session.  Therefore, that metadata can be used to create a user profile.

 

Semantic User Profiling
Jake Zarnegar, President, Silverchair

The shift to the institutional subscription has created a widespread problem of anonymous users making up the bulk of a site’s users.  In an era when it is possible to track personal topic interests more closely than ever before, many publsihers currently know less about their customers than ever before.  Two ways to overcome this problem are to require users to give you information about themselves or create a semantic user profile invisibly to the user based on their site intelligence.  This can even be done for anonymous users, but this immediately raises questions about privacy.  Silverchair has developed a statement on privacy that they use with their customers:

Silverchair privacy statement

How to build up profiles:

  • Have your content semantically tagged.  Semantics provide a normalized, logical metadata layer on content.
  • Accumulation: look at user interactions by accumulating the tags of documents they look at.  Build this over time and look for patterns.
  • Construct basic semantic profiles of users.  Parse your raw logs into semantic profiles  The rules for doing this are proprietary and vary from organization to another.  (photo of typical report for an anonymous user)
  • Create affinities and put profiles together.  Affinities can be to topical interest gorups, ads, products, events, individuals, etc.  User affinities are constantly updating as the site captures more usage.
  • Use affinities to create personalized profiles for users, create a marketing campaign, or promote products when people come on to the site.
  • Use the resulting semantic profiles to understand your audience as individual information consumers, tackle the anon user problem, and provide more detailed targeting for marketing and advertising efforts.

Don Hawkins
Columnist, Information Today and Conference Circuit Blog Editor