Archive for the ‘Personal Archiving, Feb. 2011’ Category

Three speakers discussed how people archive their files.  Devin Baker, Digital Librarian, University of Idaho, and Collier Nogues, University of California, Irvine, studied how writers manage their data.  Writers serve as focusing agents because so much of their work exists only digitally and is very valuable to them.  Many of them simply save each succeeding version of a file over the previous one, which raises the question, “Is CTRL-S poor archiving practice?”  The study by Baker and Nogues revealed that about half of the respondents primarily save over their files periodically, but only 21% do it all the time.  They have little sense of “file management” and save files in a wide variety of media–thumb drives, laptops, desktop PCs, and in e-mail.  Writers are good at backing up files, but the plethora of backups leads to problems with file management; 31% of writers said that they do not keep track of different versions of files saved in more than one location.  Not only that, but unconventional naming conventions are prevalent.  About half of the authors said they use e-mail as an archiving method; a major issue therefore is how will we archive writers’ correspondence?  Can we recover some of what has been lost already?

Hong Zhang

Hong Zhang, a doctoral student at the University of Illinois has also been studying file naming and archiving practices.  File folders are workspaces; old files implicitly become archives.  Problems arise when people forget where they put files and cannot find them again.  Often, one can determine the type of archive based on how the user has named the file.  For example, a file called “xxx-old” is generally for something that has been completed or is not expected to be referred to again in the near future.  Incorporating the date into a file name may indicate an implicit archive; for example, “2007 expense forms”.  This naming convention works for a while, but if the files are moved or the user reorganizes the computer, the system may break down.

Jason Zallinger, a graduate student at Rensselaer Polytechnic Institute, has been studying Gmail as an archiving method.  With large amounts of storage available to users, Gmail has become a “storyworld”.  Zallinger interviewed 6 users between ages 27-39 about their Gmail accounts.  He concluded that we are now all digital storytellers, historians, and autobiographers of our own lives and have become good at capturing digital data; however, we are not good at making sense of it all.  Thousands of clues to our life stories are sitting in our archives; how do we design systems for the desire to save information but not look at it?

Zallinger suggested that a “Forget” button would set a reminder on old e-mails and give the user a gentle reminder in several years to clean out what is not useful.  He also mentioned other interesting e-mail tools to help users.  Mail Goggles gives users some simple math questions to solve before the mail is sent, which may prevent e-mail users from sending messages they regret later.  Zallinger has also created a blog to document his experiences in creating an open source system to turn Gmail archives into a simple game and make them into a story.  He also has created a Wordle from his e-mails.

Visuals are powerful memory clues, and Cathal Gurrin and Aiden Doherty at Dublin City University have taken the collection of life stories to a whole new level by using wearable cameras (called SenseCams) that take about 3 pictures a minute without user interaction and capture everything they did in a day.  The cameras have sensors that trigger the captures, and have been augmented with GPS and Bluetooth devices to identify activities, personal interactions, e-mails, etc.  They do not record audio because they found that even though people do not seem to mind cameras, they will stop talking if they are being recorded.  Gurrin now has an archive covering 4.5 years that contains over 7 million photos.

Cathal Gurrin wearing his camera and holding a GPS device

A major need is to build a search engine to search this vast archive.  How can it all be organized?  One way is to designate important events and then search for them.  Another is to search automatically identified activities.  Gurrin’s research group was able to search for one event in the 30,000 stored over the past 2.5 years and retrieve it in about 2 minutes.  They have published about 40 articles about their research on visual lifelogging and their experience with the SenseCam; click here to access them.

Don Hawkins
Columnist, Information Today and Conference Circuit Blog Editor

 

 

What is everyone doing with all those cheap cameras out there?  Daniel Reetz noted that digital cameras have become cheaper than textbooks!  Very cheap cameras can change the world (not always in ways we desire!)  Cameras have always defined the aesthetic of our memories.  The aging of photos has aged our memory of past times.  The problem with cheap cameras is that they take poor pictures because of the need to keep production costs low.  They contain internal software to correct the results of the optics, so photos are processed to appear sharper than they really are.  Those pictures will one day comprise the archives of our times.  The potential for personal archiving is unbounded if imaging efforts are focused on more useful activities than simple photo retouching or changing the colors of lawns and skies to make them more appealing to consumers.   Reetz has used camera technology to build a low-cost do-it-yourself book scanner that uses cheap cameras and free software to scan books quickly and efficiently.  (See an article in Wired magazine for details and photos.)

Dwight Swanson

Dwight Swanson described some of the activities of the Center for Home Movies (CHM).  The Center’s mission is to “collect, preserve, provide access to, and promote understanding of home movies and amateur motion pictures.”  Regional film archives have put time and effort into making home movies available, but none of the existing ways of archiving are adequate for long-term access, so there are relatively few home movies online.  Part of the reason for this is because there are very few film transfer companies serving the general public, and thus our understanding of them is limited.

CHM is working to increase the availability of home movies and has organized a Home Movie Digitization and Access Summit.  The first Summit was in September 2010 at the Library of Congress’s campus in Culpeper, VA.  It drew 46 attendees–film makers, film transfer companies, and stock footage vendors and considered the following questions:

  1. A taxonomy for home movies.  The Library of Congress asked the Center to develop such a taxonomy.
  2. Cataloging and description issues.  How do home movies differ from other collections?
  3. Legal issues.  Terms of use, privacy, rights issues of orphan films.
  4. Technical issues.  Comparison of film digitization systems, recommended standards.
  5. Uses and users.  Why do home movies matter?  What is the current state of scholarship?  Who is using home movies and what are they looking for?
  6. What is the role of the Film Collectors’ Community and how can they be engaged?
 

Rich Gibson

Rich Gibson spoke about the Gigapan Project, which produces highly detailed panoramic images.  A “gigapan” is a way to capture such multiple images, software to stitch them together, and a website for viewing them.   The website gives users free accounts to share their and is a community for sharing GigaPixel imagery.  An uploader, currently under development and expected to become available in August, will provide users with an easy way to upload their images, and a “GigaPan Stitcher” will also be available to allow them to create the panoramas from individual images.

GigaPan images will change the way we see the world.  Our world is a set where we live our lives and is a museum for the artifacts we collect.  We archive because things change.

Don Hawkins
Columnist, Information Today and Conference Circuit Blog Editor

Clifford Lynch

Clifford Lynch, Director of the Coalition for Networked Information, keynoted the second day of the PDA conference.  He said that we are moving into a second generation of understanding personal digital archives, where the complex of ownership and control is not clearly understood.  We find that the shared version of a collection has more value than a personal version because of the context and commentary associated with it.  Shared spaces are vulnerable platforms–we will see more sudden shutdowns of platforms that aren’t financially viable.

Personal material is at most risk when someone moves from one job to another. Things get lost in the transition.  Platform migrations of all kinds are periods of considerable peril for the continuity of this kind of material, which is something we need to think very carefully about.  The average length of a user’s relationship with a social platform may be determined by the emergence of new platforms.  We do not understand the relationship with shared spaces for personal archiving very well.  We need “Archive Me” buttons on many more things!

We have a notion of a “public life”: a minimum record of someone’s life that is public, such as birth/death dates, public offices, or children’s names.  We have built up many systems to record those things, which are becoming much more open and extensive.  Look at the number of biographic entries in Wikipedia, for example.  In the higher education world where tracking publications is important, unique author IDs are being developed.  There is a move to make some activities more transparent and public.  We need to think about how these spaces interconnect to the general infrastructure of society that is bound up in identity, genealogy, publication, and information dissemination.  There is a strong linkage that will need considerable study.  What’s the public part of a life?  Do we have social or legal consensus on that?  How does that connect with shared social spaces?

If we simply extrapolate from the challenge of personal papers and try and shoehorn the development of shared social spaces into a framework, we will miss a tremendous number of the key issues.

Don Hawkins
Columnist, Information Today and Conference Circuit Blog Editor

Thursday evening of the PDA conference featured “fast talks”–brief presentations delivered in a minimum of time.  Some very useful personal archiving products and services were described.  Here is a brief summary of 3 of them:

  • AboutOne:  Joanne Lang, a former executive founded AboutOne to help busy people control all aspects of their records.  According to the product website, she “wanted to capitalize on her experience with cloud technology to help busy moms manage family life. She had seen how cloud computing and business software allowed businesses to eliminate mundane tasks and gain new levels of efficiency, and she wanted to bring those same benefits to families.”  AboutOne is a secure and private subscription service ($5/month, $30/year) that links all types of data and allows it to be easily entered as it happens, from anywhere.
     
  • Personal Archiving Day: The Library of Congress will host its 2nd Personal Archiving Day–an open house for the public on saving your digital information–on April 22.  It also maintains an extensive website on personal archiving, which has sections on digital photos, audio, or video; e-mail messages; personal documents; and websites.  The most popular topic by far is digital photos.
     
  • The Rosetta Project: Laura Welcher, Director of the Rosetta Project, described some of its activities.  Supported by the Long Now Foundation, the Rosetta Project is a global collaboration of language specialists and native speakers working to build a publicly accessible digital library of human languages.  One of the project’s major missions is to draw attention to the drastic loss of many of the world’s languages; a static wiki of data about each language was built from data in Wikipedia.  Recently the project moved its content into the Internet Archive.  
     

Don Hawkins
Columnist, Information Today and Conference Circuit Blog Editor

Brian Fitzpatrick

Data liberation refers to getting your data back out of places you have stored it in the cloud.  In the first day’s endnote address, Brian Fitzpatrick, Engineering Manager, Google, Chicago and founder of the Data Liberation Front, said that Google feels that a user should be able to control the data they store in any of Google’s products, and his team’s mission is to make it easier to move data in and out of their services.  Here is his team’s logo:

Data Liberation Logo

 

Why should a company help users remove their data?  It’s not for altruistic reasons; they benefit from it because it increases user trust.  Companies should develop tools making it easier for a user to leave.  Locking data in is not a valid business model.  Never in history has a distribution method like the Internet existed; it is almost free and breaks all the rules.  Google’s aim is to make products so good that users do not want to leave.  The new lock-in is innovation; focusing on building walls and locking doors to the data makes you vulnerable to innovators who will figure out ways to allow users to remove their data.

Most users don’t think about data liberation until they want to leave, but they should ask these questions before they put their data into any system:

  • Can I get my data out at all?
  • How much will it cost to get my data out?
  • How much of my time will it take to get my data out?

Some people aren’t comfortable about putting their data in the cloud, but the reality is that it’s safer there than on your laptop.

Don Hawkins
Columnist, Information Today and Conference Circuit Blog Editor