=] i think that we can end discussion and conclude that context (test / production) allow or don't allow lucky on probability, what's lucky? for production, lucky = poor disk, for production we don't allow failed disks, we have smart to predict, and when a disk fail we change many disks to prevent another disk fail could we update our raid wiki with some informations about this discussion? 2011/2/3 Drew <drew.kay@xxxxxxxxx>: >> for test, raid1 and after raid0 have better probability to don't stop >> raid10, but it's a probability... don't believe in lucky, since it's >> just for test, not production, it doesn't matter... >> >> what i whould implement? for production? anyone, if a disk fail, all >> array should be replaced (if without money replace disk with small >> life) > > A lot of this discussion about failure rates and probabilities is > academic. There are assumptions about each disk having it's own > independent failure probability, which if that can not be predicted > must be assumed to be 50%. At the end of the day I agree that when > the first disk fails the RAID is degraded and one *must* take steps to > remedy that. This discussion is more about why RAID 10 (1+0) is better > then 0+1. > > On our production systems we work with our vendor to ensure the > individual drives we get aren't from the same batch/production run, > thereby mitigating some issues around flaws in specific batches. We > keep spare drives on hand for all three RAID arrays, so as to minimize > the time we're operating in a degraded state. All data on RAID arrays > is backed up nightly to storage which is then mirrored off-site. > > At the end of the day our decision around what RAID type (10/5/6) to > use was based on a balance between performance, safety, & capacity > then on specific failure criteria. RAID 10 backs the iSCSI LUN that > our VMware cluster uses for the individual OSes, and the data > partition for the accounting database server. RAID 5 backs the > partitions we store user data one. And RAID 6 backs the NASes we use > for our backup system. > > RAID 10 was chosen for performance reasons. It doesn't have to > calculate parity on every write so for the OS & database, which do a > lot of small reads & writes, it's faster. For user disks we went with > RAID 5 because we get more space in the array at a small performance > penalty, which is fine as the users have to access the file server > over the LAN and the bottle neck is the pipe between the switch & the > VM, not between the iSCSI SAN & the server. For backups we went with > RAID 6 because the performance & storage penalties for the array were > outweighed by the need for maximum safety. > > > > -- > Drew > > "Nothing in life is to be feared. It is only to be understood." > --Marie Curie > > -- Roberto Spadim Spadim Technology / SPAEmpresarial -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html