Solutions for Storage
I’m still in the early stages of tech spec-ing my ideal lifeblogging system, but one obviously required component is storage, potentially a lot of it.
Bell devotes quite a lot of pages in his book to quantifying how cheap it’s gotten and how it’s only going to get cheaper. Ray Kurtzweil will happily tell you its going to get exponentially cheaper and help us achieve singularity in our lifetime.
Still, it’s not free. You’re going to have to trade or barter something for it.
It’s likely to be worth some outlay though to be able to conveniently centralize your life data. If you’re like me, you already have a masses of binary data, but it’s spread across multiple devices. Insights about your life could come quickly, if you could only corral it all into one place, start processing it (with passes for optical character recognition, pattern recognition, speech-to-textand so on), and come up with a means to index and search it all (a “Google” for your life data, in other words).
As part of my day job, I install and maintain large-ish (~200 TBs), high performance RAIDs (Redundant Array of Inexpensive Disks), so I know something about storage, though I won’t claim to be an expert. I don’t need anything like a RAID that’s that large, but I knew even building a small one (4 TBs was my goal to start my “LifeDataHopper”) could get expensive when you factor in a host machine that has a decent motherboard, a good PCIe RAID control card, a SATA disk enclosure/back plane, 1+ TB drives, etc.
So I started this weekend just Googling for much cheaper, software-only RAID solutions. In keeping with my mantra to repurpose what hardware I can, I even toyed with the idea of using a subnotebook PC I have lying around as a host, buying a cheap USB hub, attaching what external USB hard drives I already have, and striping them all at RAID5. Linux makes it simple to make a disk array out of basically any storage devices it detects (it doesn’t care if they’re from the same manufacturer or even if they’re the same capacity). You can then keep growing the RAID by adding additional drives and rebuilding. For example, here’s a how-to from a guy who made RAID out of a bunch of USB thumbdrives.
Cool, but the problem with using USB as the connection method is that it’s read/write times are too damn slow, as many people point out here. You’d at least need a lot of USB controllers on your motherboard, and even then USB2 isn’t such a fast protocol.
On the other hand, if you happen to have a machine lying around with multiple SATA controllers on the motherboard, a software RAID might be a good solution to get your life data storage going.
I didn’t have a machine with specs like that, or one with a spare PCIe slot for a decent RAID control card, so I started thinking about a wireless NAS (Network Attached Storage) solutions. There are lots of these out there — basically they’re boxes that hold two to four SATA drives, can stripe them into a RAID via onboard hardware , attach to a home network router via 1Ge or Wi-Fi, and let PCs and mobile devices on that same network access the storage.
By this point, I was tired of Googling things and headed for Pantip (sometimes you’ve just got to get outside and make a life to log), Bangkok’s largest electronics bazaar to just see what I could find. Mostly by chance, I stumbled upon this solution, a RAID enclosure with onboard hardware for striping and building a RAID from up to four SATA drives. There’s no network interface, but there is a slew of connections option (USB 2.0, Firewire 400/800, and eSATA) for connecting it up to a host PC. I put four 1 TB drives in it (though I could have gone for 1.5 TB or even 2 TB drives), attached it to the Windows box I have attached to my TV and home theater system (my Boxee media server). Setup was easy (though Googling was involved to get Windows to recognize a >2 TB partition).
The reviews for this unit are largely good, but there are some complaints too (as there is with all hardware). I’m not too worried though. I plan as well for off site backups of my life data to AmazonS3 for a secondary level of redundancy. That’ll have to be a topic for a future post though.
So, for less that 500 USD (and prices on imports here are inflated somewhat — you can likely get something like this going for cheaper in North America or Europe), I now have 2.72 TBs of usable life data storage (the rest is lost to RAID5 level redundancy, but this means that so long as I don’t lose more than one drive of the four simultaneously, my data is reasonably safe).
Now, let the consolidation of my life data begin!