What Data to Capture

Let’s forget about the how for the moment, about all that technological whizbang, and talk about the what. What data do you want to capture because you feel it will be useful now and after you gone in organizing, reconstructing and illuminating the many facets of your life? In other words, what data do you want to constitute you e-memory, to borrow a term from Total Recall (the book, not the movie).

Well, a point made more than once in that same book is that it’s tough to say what data will be important in the future, and it’s cheap to store it anyway, so capture and save basically everything you can. The authors’ examples on this point are quite practical: Who can say for sure what the IRS might demand in an audit?

I think it goes deeper than that. Just based on my experiences so far of scanning and converting to search-able text five or six years worth of financial data, it’s become clear that every receipt, every scrap of paper, might have future utility in jogging memories, in helping me and my progeny reconstruct your life, my whereabouts, my thoughts, my emotional frame of mind, and so on. Just scan it, or other wise capture it. Just throw it all into the hopper. The software for organizing all this data, for finding connections between all those disparate data points, will only get better.

Now you can make all this data as private or public as you wish, but think about things for a moment from the perspective of a future historian. If that historian is attempting to reconstruct the life of an average citizen of, say, Amsterdam in 1580, he or she probably has remarkably little data to work with: a few city records of birth, marriage, death, taxes paid, property purchased, and so on. Perhaps, but more than likely not, the subject left behind a journal and some correspondences. That would be about it.

Vast swaths of that subject’s life experience are simply unrecoverable. It’s as though they never happened.

Contrast this with what the historian could glean about a day in the life of an average person today from an industrialized part of the world who bothered to deliberately capture even a small portion of the data on the list below. From even just the relatively small amount of data I’ve started to centralize from my finances, e-mail correspondence, journal entries (yes, I keep one, on and off), photos, etc., I could pick any day in my life in, say, 2004, and recover a startlingly complete picture of my activities and interior thoughts (where I was: what I ate; who I was with, called, wrote, and electronically chatted with; what was said during some of those conversation; what I was thinking about; what my emotional state was at the time; and so on).

OK, but what’s a good sort of minimal checklist? I’m going to start one here. I anticipate that I’ll come back here often and append items to the list. I’ve loosely organize things based on their ease of capture. If a data stream is already in binary form, and it’s simply a matter of centralizing it, I’ve listed it first.

Some of these datastreams are only going to get easier to capture with the advent of commercial products like wearable cameras and wearable/implantable biometric sensors.

Textual Data

  • E-mail.
  • Electronic chat sessions.
  • Calander entries.
  • Finance data in Quicken and similar formats.
  • Scanned financial records (bank statements, receipts, tax filings, etc. — everything in that big, ugling filing cabinet in your home office).
  • Scanned medical records (hospital billing records, insurance correspondence, etc.).
  • Scanned journals.
  • Scanned correspondence.

Visual Data

  • Digital photos.
  • Digtal videos.
  • Webcam feeds (yours and those maintained by other but which might which help paint a fuller picture of your surroundings).
  • Surveillance video of you in public spaces (if you could only access that data!).
  • Digitized video tapes.
  • Scanned photos and slides.

Audio Data

  • Conversations via VOIP (Voice Over Internet Protocol) services like Skype.
  • Taped dictation, journal, or correspondence.

Biometric Data

  • Output from heart rate monitors and other similar devices worn during workouts.

Spatial Data

  • GPS (Global Positioning System) data collected from various mobile devices.

(On the subject of spatial data, I was thinking today how cool it would be to write a script to strip location data out of e-mail headers based on the domain name of the smtp server.)

Personal Digital Forensics Data

(What metadata could you start capturing from all the computers and mobile devices that you now use?)

  • Web browser history.
  • OS logs (detailing application startup and stop times, etc.).

Public Digital Forensics Data
(What data did you leave behind as part of your interactions with the Internet.)

  • Blog and online forum posts.
  • Written contributions to social networking sites.
  • Advertisement
    Explore posts in the same categories: Planning

    Leave a Reply

    Fill in your details below or click an icon to log in:

    WordPress.com Logo

    You are commenting using your WordPress.com account. Log Out / Change )

    Twitter picture

    You are commenting using your Twitter account. Log Out / Change )

    Facebook photo

    You are commenting using your Facebook account. Log Out / Change )

    Connecting to %s


    Follow

    Get every new post delivered to your Inbox.