The large drive scenario is very attractive from a price per gig of storage, but the fundamental reliability of these storage methods have not increased in proportion to the growth in size. I am also finding so many in my "environment" have any grasp of what a gigabyte is and how long it takes to move it in bits and bytes without corruption.
I wouldn't worry about the transfer protocol/media reliability that much as generally there are enough checks build into all of the transport protocols to avoid corruption (speed for sure, its kinda painful to take a week to recover from backups when you need to
). The drives themselves are even better than they were 15-20 years ago (although single bit errors are more common they have pretty good recovery/checks built into the drives for those now), the main problems left (simplified
) is either catastrophic whole drive failure or failure of whole blocks. So basically structure your raid that you reduce the odds of them sufficiently to not have to deal with the slow solution in any meaningful timeframe.
So what do you think of my resulting to strategies i am thinking off to offset some of the risk of second drive failure during recovery.
a) Make the drives smaller and put more in the raid. In my case there is a limit to max drives for Raid 10 or 0+1 but there are other raids such as Raid 30 or 0+3
My thinking is if the data is important and availability or as you say resiliency is an issue to rather use smaller drives and more of them.
Surprised that R30 is available... At 6 disks it does offer somewhat better space utilization than R10, but no better than R50. The failure probabilities are no better than R50 and I don't think better than R10 (math is hard...
) and I believe worse than R6. Personally I'd pick R10, R5/R50, or R6 over R3/R30 because the former are more commonly used meaning the code paths are better tested. I like well used code paths it means someone else has found all of the problems.
b) Since we tend to build these computers and have all the drives come from pretty much very close to each other in terms of manufacturing batch , my thought has been to buy 50% of the required drives initially and then leave a decent (what decent is i have not determined yet) gap and then buy the other half. Thinking here being the mirror element gets added later with a view to gambling that the second batch is unlikely in the distant future to have a failure at the same time as the first. Where as if they all start out together and one reaches the stage where failure takes place, the view i have read is that there is a greater probability of catastrophic failure during recovery due to all drives being of the same "age" "wear" status.
You have two likely failure curves. First is the bathtub curve (infant mortality, followed by the "good times" followed by age and decay.. amusingly software has a similar lifecycle..) almost all hardware follows this to some extent (some have higher beginning or end curves of course). The second is just bad parts/manufacturing defects (we had one batch of several hundred drives all have a spring fail within ~2 weeks of eachother at just about 6 months). Mitigating against both is kinda hard. An alternative strategy would be to buy from two manufacturers (making sure they aren't just relabeled
) and make those the mirror pairs.
I don't think your initial plan is/was bad, you just need to consider the risks and see if the cost/benefit is probably worth it.
Mostly what you're trying to do with the local redundancy is reduce time to recovery when you either:
a) suffer a disk failure
or possibly
b) accidentally oops a file or three
Another option to consider is adding a USB connected external drive that you periodically mirror to. With that you could take incremental backups from the point of the mirror and then you'd usually only have a small amount to recover if the main drive failed.
You have to remember as well that you're dealing with very small probabilities here as well. This gets really important when you're working with hundreds of computers or super high volumes of data, but the actual odds of a home array failure in the lifespan of the computer is really quite low. So its worth spending some effort for peace of mind but eventually you pick one of several decent options, quit worrying and move on
My current home setup is a two disk R1 array for kinda critical data backed up to a USB/network drive (these oddly don't cost much more than bare drives and sometimes cost less because of volume contracts, etc..) and a 5 disk R5 array (5 disk because we had 5 left over from another project, not because its better than three larger
) for less important stuff. I'd have gone R6 but the setup didn't warrant it (and the R1 array is also the boot array because thats easier to recover the OS from as either disk works stand alone).
I would embrace the "cloud" aspect and online storage to a far greater extent if i knew that companies would go on forever. But they dont. Things happen and when sold or brankcrupt and that server with your data on gets scrapped or sold piecemeal ones data is either in the wrong hands or gone from the face of the earth.
Our old kodak happy snaps have more reliability to be around when you consider what b&w images we still see today from peoples past. Excuse me if i am skeptical about the whole cyber world.
A smart friend (mentor as well) of mine once explained it to me like this "when evaluating a technology don't consider how hard it is to get your data in, consider how hard it will be to get your data back out." That is so so very true of the cloud services. There is nothing magical about the "cloud" its just a bunch of computers in a bunch of datacenters run by companies and people. Things happen to both
I wouldn't worry to much about the "wrong hands" part if you are diligent about encryption
And we still do backups. Though zfs snapshots are VERY cool, and eliminate a lot of the need for backups. It would be nice if zfs would migrate down into Mac and Windows. Just imaging Mac Time Machine being smart enough to use zfs snapshots. Sigh.
I LOVE snapshots! They have saved me sooo much time. The downside is when you write something you didn't .. want to.. to the disk and its preserved in the snapshot - DOH!
At the current gig we've gone the other way from hyper redundancy per computer and everything hardware wise is replaceable, we just throw more computers at the problem