On Aug 5, 2014, at 1:42 PM, Gionatan Danti <g.danti@xxxxxxxxxx> wrote: > > I am trying to imagine _how_ the various vendors arrive at the claimed number and _how much_ we have confidence in URE rate. I'd say it's next to useless, and a different question needs to be asked which is how much redundancy is a good value relative to the value of the data; and then coming up with a strategy that meets the uptime and redundancy preference for a given budget. >> Furthermore, as already again stated, very likely >> an "average" HDD has much lower URE probability. > > This is reassuring :) The spec only accounts for the drive itself. Not the cables, the controller, the computer's non-ECC memory, and notably one of the greatest sources of data loss: user error. It also doesn't account for the complete implosion of the drive, for any number of reasons, head impacts the spinning surface and either destroys the data on the surface or the read/write head; actuator death; spindle motor death, logic board death, power supply death, etc. So to mitigate drive and cable problems we use RAID. For controller, logic board, power supply failure concerns, we use clusters. More than a handful of URE's, even if they were to bust the manufacturer spec, is the loss of a single drive represents hours or days of rebuild because one drive holds so much more data today. Right now, md RAID 6 + XFS + Gluster clusters is a rather straightforward setup. For volume snapshots to mitigate user induced data loss, LVM2 thinly provisioned LV's can be used. I haven't tested it yet but I think the LVM2 integrated RAID does work with thinp LV's, so it's possible to remove a layer if you're OK with the different LVM raid management tools compared to mdadm. Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html