Archive for the ‘Personal Archiving, Feb. 2011’ Category

The endnote address at the PDA conference was given by Rudy Rucker, Sr., a science fiction author, who spoke about Lifebox Immortality.

Rudy Rucker

We dream of achieving immortality by creating a software copy of ourselves.  We don’t know how the brain stores information, so the closest thing we can do is to find a way of getting a big archive of our thoughts and memories, and put in some tags and links.  It’s hardly practical to do it yourself–you need to find a way to automate the process.

Rucker’s book, The Lifebox, the Seashell, and the Soul incorporates the concept of a “Lifebox”, which is a really good personal digital archive, or a digital hyperlinked copy of a person’s memory, in which you have a lot of data. It is easy to search your Lifebox data with Google Search.   People could create memoir-like structures using the Lifebox.  Or you could create a chatbot where you could type in any question and get an answer.  You will find that people will not usually answer your questions directly; instead, they will say something that relates to what you ask.  So you must ask again and keep the conversation going.  But this is not a standin for yourself.  It is hard to write your life story.

One of the secrets in writing is to write like you talk.  You could use a cell phone-like device and just tell stories about yourself, but the missing thing is the spark.  Intelligence is mainly many evolving neural networks; there is no underlying theory about creating intelligence. The secret of the Lifebox is to save your memories.  You don’t have to do much more than create the data because we are tuned to emulate other people.

E-mail archives are dangerous–there could be things in the archive that you wish you had not sent.  There is no standard way of making your LifeBox.  They are a way of self-expression.


I found the PDA conference fascinating and highly educational.  I never realized that personal digital archiving had so many implications or applications–certainly many more than simply digitizing a box of old family photographs (although that is of course included).  It’s relevant to information professionals, especially those who interact with the public, because awareness of it is growing, and the number of people doing it is growing rapidly.  And it is a whole new type of information with its own issues and applications.  I look forward to the next conference.

Don Hawkins
Columnist, Information Today and Conference Circuit Blog Editor

What is the proper boundary between public and private data?  How far should archivists go when collecting what might be private data?  These questions introduced this session.  The first presentation was a discussion of archival applications of digital forensics tools and techniques by Kam Woods, a Postdoctoral Research Associate at the School of Information and Library Science, University of North Carolina at Chapel Hill.  His research focuses on developing techniques and tools to assist in long-term archiving and educational support for digital forensics datasets.

Kam Woods

Here are the differences between digital forensics and archiving.

 

Digital Forensics vs. Archiving

The main thing with archiving is to know what you have been given.  Archivists are increasingly finding themselves dealing with streams of heterogeneous data.  It is important to reduce risk in the acquisition process and maintain the integrity of the data.  Private and sensitive data must be appropriately protected, and the authenticity and chain of custody (patterns of use and activity) of the data become important.

Advanced forensic formats include raw streams of data, cryptographic hashing, and metadata.  Woods is working on a bulk extractor which will process data and produce Dublin Core metadata and digital forensic XML (DFXML).  The open source code and APIs of this and other related programs are available here.

The Personal in the Organizational

What happens to personal data that is embedded in company records when the company fails?  Sam Meister, Digital Archivist and Consultant to the Sherwood Project at the University of Maryland has looked at this question.

 

Sam Meister

The Sherwood Archive Project, run in cooperation with Sherwood Partners in Mountain View, CA offers a private alternative to public  bankruptcy.  It attempts to save the records of business by investigating the potential to preserve the “abandoned” records of failed companies.  When a company goes into bankrupcy, it assigns its data to Sherwood, which takes over ownership and trys to sell the intellectual property.  Personal information on employees, suppliers, and users is often found in the records of the company, which raises issues on the disposition of the data.  Records of startups tend to be particularly messy and difficult to deal with.

There is often not much regulation of this data, so disposition becomes an ethical issue. Codes of Ethics are available from the Society of American Archivists and the International Council on Archives.  It is necessary to establish a relationship of trust between the original donor (which is not Sherwood) and the archive, and it is difficult to know what is in the records.  If companies did not collect employee records, that would eliminate many concerns, but that would damage links between those records and others in the collection because personal identifiers are often major way records are linked together.

Technological solutions to these problems are available, but we must be sure that all the personal information is removed. One possibility is an initial period of restricted access until the data has been examined.  In all these issues, there is a private to public transition, with a need to establish trust between private collections and repositories.

Don Hawkins
Columnist, Information Today and Conference Circuit Blog Editor

 

Personal health data has many unique issues, especially in the privacy area.  Gordon Bell leads a research project at Microsoft Research, My Life Bits, which is aimed at capturing everything in a person’s life.  According to the project website, Bell has collected “a lifetime’s worth of articles, books, cards, CDs, letters, memos, papers, photos, pictures, presentations, home movies, videotaped lectures, and voice recordings and stored them digitally.  He has become virtually paperless and is now beginning to collect phone calls, IM transcripts, television, and radio. ”  The entire collection comprises 200,000 items and 100 gigabytes of storage.  An article in the March 2007 issue of Scientific American describes some of his work.

Gordon Bell

At the PDA Conference, he said that with the digitization, capture, and storage of all personal information, we are now realizing Vannevar Bush’s famous Memex vision, and he suggested that his work would find practical applications in collecting personal medical archives.  He said that the SenseCam (see the earlier presentation by Cathal Gurrin) is a killer application for recording health information.  One of the most important issues is privacy, and doing nothing about it is fine according to Bell.  He also thinks that no single vendor will ever be able to solve all of the needs in health archiving.  Bell has scanned his entire health history (even back to a letter recording a 1942 visit to the Mayo Clinic!).  The archive has 400 files and 1 gB of images.  He regrets that he ever threw anything away!

Bringing personal health archiving to the masses

Khaled Hassounah

MedHelp claims to be the largest online health community, with 12 million unique visitors a month and is growing at a rate of 40,000 new users a month.    It has over 300 forums, partnerships with leading medical institutions, and over 200 experts who respond to users’ questions.  Khaled Hassounah, MedHelp’s CTO, explained the difference between electronic medical records (EMRs) and personal health records (PHRs).  An EMR is a digital record of your interactions with a healthcare provider.  It is stored by the provider, and access to it is regulated by law.  In contrast, a PHR is created and maintained by the patient, who controls access to it.

MedHelp provides tools for tracking and sharing one’s health data.  About 500,000 people have been using the system to track themselves over the past 2 years.  When MedHelp was first developed as a purely archival system, usage was low because people are interested in managing their health, not an archive, which was seen as a by-product.  Sharing was not an issue; privacy was seen as selective–an option, not a restriction, and not as important as financial data.  Once capabilities to track actual medical data were added and, importantly, play it back, usage increased dramatically.  85% of the trackers that have been created are public; on the average, each user has 2.3 trackers.  Mobile tracker usage is exploding.

Many different variables can now be tracked on the system; for example here is one person’s water consumption tracked over the past 2 months.

Here are some of the lessons learned as a result of this experience.

Electronic Medical Records (EMRs)

Linda Branagan

Linda Branagan, Director, Telemedicing Products, Medweb, expanded on the discussion of EMRs and PHRs.  She noted that there are 3 types of PHRs:

  • Type 1 PHRs are patient owned and maintained.  They are a digital record of all one’s interactions with a healthcare provider and may be provided to the patient by the provider or a third party.  They are not covered by HIPAA privacy laws.  Both Google and Microsoft have developed products for maintaining these types of PHRs.
  • Type 2 PHRs are “tethered” to an EMR system.  They provide data on appointments, lab results (especially if they are normal), prescriptions, etc. and are generally stored and maintained by healthcare providers.
  • Type 3 PHRs are a self-created data store like MedHelp.

PHRs have not been universally embraced by healthcare providers because providers do not trust them and also because of a fear of liability issues (the data could be used by lawyers in litigation, for example).  On a positive note, however, some providers are making use of PHRs because they can prevent duplicate tests, and they are useful in coordinating care in complex cases.

The amount of personal health information in digital form continues to increase.  The decision whether to make it public is a personal one.  Things to consider are the risk of disclosure vs. help from family and friends and the usefulness it would provide to healthcare providers.

Don Hawkins
Columnist, Information Today and Conference Circuit Blog Editor

Digital forensication refers to using digital methods, concepts, and tools in contexts other than criminal investigations.  According to Cal Lee from the University of North Carolina (UNC),  efforts have recently been made to connect the forensics and archiving areas.

Institutions are receiving media and want to collect the online traces of individuals.  In response, UNC has created a learning laboratory for studying the application of digital forensics to the acquisition of digital materials, including training, hardware, and software for running exercises.   It also includes creating, annotation, and disseminating data that can be used for instruction.  The Digital Corpora website has a variety of case studies and training aids, such as typical scenarios and problems to be solved.

Here is Lee’s vision for the future of digital forensication.

 

Personal digital archiving, the diminishing information age, and the archival paradigm

Richard Cox

Richard Cox, Professor of Archival Studies at the University of Pittsburgh (his home page is here) raised challenges for archivists in today’s information environment.  He said that it is easy to become so immersed in technology that we ignore what we know about life.  We are not listening to each other and exchanging information.  Archivists are a rarified group.  How do others see us?

One of today’s realities is that people are losing confidence in their ability to access information.  For example, information is diminishing as e-books proliferate: there is no sensation of page turnings, you cannot comment, etc.  We are losing artifactual information and physical evidence of memories.  The demise of publishers and university presses is not being replaced by new forms of e-scholarship.  The world of information is a free-for-all, so we need librarians more than ever.  Bookstores are taking the place of libraries, but there are no librarians in them!  Students don’t know how to read and think because slow reading is disappearing.  Independent bookstores have been driven out by big chains, which are now in trouble.  You can look at books in stores, but you can’t do it online.  These closings have negative impacts in communities.

Printed newspapers and journals are also disappearing and have been replaced by online versions and blogs.  Blogs are not built on reliable information, so this is another area of declining information.  Places that educate people have changed from library schools to information schools, which don’t have the focus on reading courses, etc.

Archivists worry about collecting things.  We must begin to be enablers, not acquirers.  People are becoming worried about what might happen to their data.  Individuals’ projects don’t solve problems or know what came before.  We must think more deeply and broadly.  What we know of the world comes through words.  What are we writing?

 

Archival Sense-Making

Mark Matienzo

Amelia Abreu

 

 

 

 

 

 

 

 

 

 

Mark Matienzo, Digital Archivist in Manuscripts and Archives, Yale University Library, and Amelia Abreu, a Ph.D. candidate at the University of Washington Information School said that in order to make sense of our archives, they must include artifacts as well as documents.  All archival acts are explicitly historical, and all information in them is view as subjective.   In personal digital archives, profiles consist of collections such as scrapbooks, etc.  The context of collecting is significant.  Sense-making shows promise in personal digital archives and can be applied to many archival activities.  It will make archives become more than archives of facts and instead make them “archives of feelings”.

Don Hawkins
Columnist, Information Today and Conference Circuit Blog Editor

One of the highlights of the PDA conference was a session in which 3 computer industry pioneers not only provided a look back in time but also gave their perspectives on some of the issues of today.  The pioneers were Ted Nelson, founder of Project Xanadu (the first computer hypertext project); Ed Feigenbaum,  often called the “father of expert systems”; and Christina Engelbart, appearing on behalf of her father, Doug Engelbart, inventor of the mouse.

Ted Nelson

Ted Nelson calls himself “the industry’s loyal opposition” and has strong opinions about today’s digital world.  He said that we are being assaulted by fads and inane technologies and that intelligent software means that it takes control away from the users.  Our worst problem now is the myth of technology.  Most things that people call technology are packages, conventions, and constructs.  E-mail is not a technology; the technologies making e-mail possible are TCP/IP and file transfer.  Windows, Macs, are just packaging; underneath they are all the same.

What convention should we have for documents?  In the Middle Ages, documents had writing in the margins.  Why is that in PDF and HTML documents, you cannot put notes in the margin? Because the developers thought that users would not need that capability, so they did not design for it.  Nelson is now developing a digital document browser that generalizes side-by-side views so you can see the connections between them.

The computer world is built on hierarchies and rectangular objects.  Where are the connects between them?  The initial file consisted of punched cards in a box with a name on the end. Then came a file containing the name of all files, which led to hierarchical directories.  Nobody foresaw that we would want to store photos, audio files, documents, etc.  The only thing we can count on being permanent is filenames.  Project Xanadu was an attempt to improve on paper and show the thoughts of the writers.  The structure of documents was lost when fonts were displayed on the screen.  All this was an imitation of paper and did not improve on it.

Edward Feigenbaum

Ed Feigenbaum is a pioneer in the development of artificial intelligence and has been called the “father of expert systems”.  He now Professor Emeritus at Stanford University and has continued working on knowledge systems.  His latest work is on the Stanford Self-Archiving Legacy Toolkit (SALT) project, now being developed at the Stanford University Libraries and Academic Institution Resources (SULAIR).  In this case, Feigenbaum defines “self” as “emeritus professors with archives worth preserving for scholarly use and doing the preservation with only a little help from professional librarians.  ”Toolkit” means the software and web page formats to facilitate the process.  SALT has taken a “Janus” approach with two faces:  one facing outward toward the scholars, researchers, and scholars of today and in coming years, and the other facing inward towards the repository to facilitate the work of the scholars building it.  The inward face uses Zotero as a tool for entering the metadata.  The two faces communicate regularly to update the repository and synchronize it with the Zotero cloud servers.

SALTworks is the name of the experimental system supporting searching of the Feigenbaum Digital Archive which contains 15,000 documents.  It is also being used by other scholars on their own archives.  A major lesson for library archivists is to get to scholars while they are alive and can supply content, stories, and metadata for their archives.  It is much harder to compile archives after they are dead, and the archives will not be as rewarding for the scholars and students of the future.

Christina Engelbart

Christina Engelbart appeared on behalf of her father, Douglas Engelbart, with whom she still works.  Doug Engelbart’s strategic vision catapulted us into the information age.  The day after he became engaged, he started thinking about his life’s goals.  He wrote a report on human intellect, and then invented the mouse in 1964.  He made one of the first transmissions over the Internet in 1969 and was one of the first proponents of “boosting our collective IQ”.  He founded the Doug Engelbart Institute in 1988 to preserve the history of his work.

 

Engelbart began archiving his materials early in his career, and re-archived everything on the web in 1995.  The Stanford Mouse Site tells the history of his invention of the mouse and contains many original materials.  The Computer History Museum in Silicon Valley has a replica of the original mouse and other materials.  Many of Engelbart’s materials, including his early videos, are in the Internet Archive.

In developing a scholar’s archive, context is everything.  What is their story, and what were they thinking?  Archives can humanize history; people use things today that can be traced back to an idea (for another example, see the Harold Edgerton Digital Collection).  Technology supports how we work together, and everyone is part of the collective intelligence of the world.

Don Hawkins
Columnist, Information Today and Conference Circuit Blog Editor