Re: SCSI vs SATA

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, 7 Apr 2007, Ron wrote:

The reality is that all modern HDs are so good that it's actually quite rare for someone to suffer a data loss event. The consequences of such are so severe that the event stands out more than just the statistics would imply. For those using small numbers of HDs, HDs just work.

OTOH, for those of us doing work that involves DBMSs and relatively large numbers of HDs per system, both the math and the RW conditions of service require us to pay more attention to quality details.
Like many things, one can decide on one of multiple ways to "pay the piper".

a= The choice made by many, for instance in the studies mentioned, is to minimize initial acquisition cost and operating overhead and simply accept having to replace HDs more often.

b= For those in fields were this is not a reasonable option (financial services, health care, etc), or for those literally using 100's of HD per system (where statistical failure rates are so likely that TLC is required), policies and procedures like those mentioned in this thread (paying close attention to environment and use factors, sector remap detecting, rotating HDs into and out of roles based on age, etc) are necessary.

Anyone who does some close variation of "b" directly above =will= see the benefits of using better HDs.

At least in my supposedly unqualified anecdotal 25 years of professional experience.

Ron, why is it that you assume that anyone who disagrees with you doesn't work in an environment where they care about the datacenter environment, and aren't in fields like financial services? and why do you think that we are just trying to save a few pennies? (the costs do factor in, but it's not a matter of pennies, it's a matter of tens of thousands of dollars)

I actually work in the financial services field, I do have a good datacenter environment that's well cared for.

while I don't personally maintain machines with hundreds of drives each, I do maintain hundreds of machines with a small number of drives in each, and a handful of machines with a few dozens of drives. (the database machines are maintained by others, I do see their failed drives however)

it's also true that my expericance is only over the last 10 years, so I've only been working with a few generations of drives, but my experiance is different from yours.

my experiance is that until the drives get to be 5+ years old the failure rate seems to be about the same for the 'cheap' drives as for the 'good' drives. I won't say that they are exactly the same, but they are close enough that I don't believe that there is a significant difference.

in other words, these studies do seem to match my experiance.

this is why, when I recently had to create some large capacity arrays, I'm only ending up with machines with a few dozen drives in them instead of hundreds. I've got two machines with 6TB of disk, one with 8TB, one with 10TB, and one with 20TB. I'm building these sytems for ~$1K/TB for the disk arrays. other departments sho shoose $bigname 'enterprise' disk arrays are routinely paying 50x that price

I am very sure that they are not getting 50x the reliability, I'm sure that they aren't getting 2x the reliability.

I believe that the biggest cause for data loss from people useing the 'cheap' drives is due to the fact that one 'cheap' drive holds the capacity of 5 or so 'expensive' drives, and since people don't realize this they don't realize that the time to rebuild the failed drive onto a hot-spare is correspondingly longer.

in the thread 'Sunfire X4500 recommendations' we recently had a discussion on this topic starting from a guy who was asking the best way to configure the drives in his sun x4500 (48 drive) system for safety. in that discussion I took some numbers from the cmu study and as a working figure I said a 10% chance for a drive to fail in a year (the study said 5-7% in most cases, but some third year drives were around 10%). combining this with the time needed to write 750G useing ~10% of the systems capacity results in a rebuild time of about 5 days. it turns out that there is almost a 5% chance of a second drive failing in a 48 drive array in this time. If I were to build a single array with 142G 'enterprise' drives instead of with 750G 'cheap' drives the rebuild time would be only 1 day instead of 5, but you would have ~250 drives instead of 48 and so your chance of a problem would be the same (I acknoledge that it's unlikly to use 250 drives in a single array, and yes that does help, however if you had 5 arrays of 50 drives each you would still have a 1% chance of a second failure)

when I look at these numbers, my reaction isn't that it's wrong to go with the 'cheap' drives, my reaction is that single reducndancy isn't good enough. depending on how valuble the data is, you need to either replicate the data to another system, or go with dual-parity redundancy (or both)

while drives probably won't be this bad in real life (this is after all, slightly worse then the studies show for their 3rd year drives, and 'enterprise' drives may be slightly better) , I have to assume that they will be for my reliability planning.

also, if you read throught the cmu study, drive failures were only a small percentage of system outages (16-25% depending on the site). you have to make sure that you aren't so fixated on drive reliabilty that you fail to account for other types of problems (down to and including the chance of someone accidently powering down the rack that you are plugged into, be it from hitting a power switch, to overloading a weak circuit breaker)

In looking at these problems overall I find that in most cases I need to have redundant systems with the data replicated anyway (with logs sent elsewhere), so I can get away with building failover pairs instead of having each machine with redundant drives. I've found that I can frequently get a pair of machines for less money then other departments spend on buying a single 'enterprise' machine with the same specs (although the prices are dropping enough on the top-tier manufacturers that this is less true today then it was a couple of years ago), and I find that the failure rate is about the same on a per-machine basis, so I end up with a much better uptime record due to having the redundancy of the second full system (never mind things like it being easier to do upgrades as I can work on the inactive machine and then failover to work on the other, now, inactive machine). while I could ask for the budget to be doubled to provide the same redundancy with the top-tier manufacturers I don't do so for several reasons, the top two being that these manufacurers frequently won't configure a machine the way I want them to (just try to get a box with writeable media built in, either a floppy of a CDR/DVDR, they want you to use something external), and doing so also exposes me to people second guessing me on where redundancy is needed ('that's only development, we don't need redundancy there', until a system goes down for a day and the entire department is unable to work)

it's not that the people who disagree with you don't care about their data, it's that they have different experiances then you do (experiances that come close to matching the studies where they tracked hundereds of thousands of drives of different types), and as a result believe that the difference (if any) between the different types of drives isn't significant in the overall failure rate (especially when you take the difference of drive capacity into account)

David Lang

P.S. here is a chart from that thread showing the chances of loosing data with different array configurations.

if you say that there is a 10% chance of a disk failing each year (significnatly higher then the studies listed above, but close enough) then this works out to ~0.001% chance of a drive failing per hour (a reasonably round number to work with)

to write 750G at ~45MB/sec takes 5 hours of 100% system throughput, or ~50 hours at 10% of the system throughput (background rebuilding)

if we cut this in half to account for inefficiancies in retrieving data from other disks to calculate pairity it can take 100 hours (just over four days) to do a background rebuild, or about 0.1% chance for each disk of loosing a seond disk. with 48 drives this is ~5% chance of loosing everything with single-parity, however the odds of loosing two disks during this time are .25% so double-parity is _well_ worth it.

chance of loosing data before hotspare is finished rebuilding (assumes one hotspare per group, you may be able to share a hotspare between multiple groups to get slightly higher capacity)

RAID 60 or Z2 -- Double-parity must loose 3 disks from the same group to loose data:
disks_per_group  num_groups  total_disks  usable_disks  risk_of_data_loss
            2          24           48           n/a                n/a
            3          16           48           n/a         (0.0001% with manual replacement of drive)
            4          12           48            12         0.0009%
            6           8           48            24         0.003%
            8           6           48            30         0.006%
           12           4           48            36         0.02%
           16           3           48            39         0.03%
           24           2           48            42         0.06%
           48           1           48            45         0.25%

RAID 10 or 50 -- Mirroring or single-parity must loose 2 disks from the same group to loose data:
disks_per_group  num_groups  total_disks  usable_disks  risk_of_data_loss
            2          24           48            n/a        (~0.1% with manual replacement of drive)
            3          16           48            16         0.2%
            4          12           48            24         0.3%
            6           8           48            32         0.5%
            8           6           48            36         0.8%
           12           4           48            40         1.3%
           16           3           48            42         1.7%
           24           2           48            44         2.5%
           48           1           48            46         5%

so if I've done the math correctly the odds of losing data with the worst-case double-parity (one large array including hotspare) are about the same as the best case single parity (mirror+ hotspare), but with almost triple the capacity.




[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux