Re: 3TB drives failure rate

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Oct 28, 2012, at 2:49 PM, Roman Mamedov <rm@xxxxxxxxxx> wrote:

> On Sun, 28 Oct 2012 14:34:49 -0600
> Chris Murphy <lists@xxxxxxxxxxxxxxxxx> wrote:
> 
>> A drive that can sit there like a bump on the log for 2 minutes before it issues an actual read failure on sector(s) means mdadm is likewise waiting, and everything above it is waiting. That's a long hang. I'd not be surprised to see users go for hard reset with such a system hang after 45 seconds let alone 2 minutes.
> 
> Which is not different from what you would get (the same hang) in a non-RAID
> environment on the same drive, so I don't see how this would be specifically a
> RAID-related problem.

The difference is what you depend on running on that RAID. If it's someone playing movies or games, who cares. If you're a small business running a database or web site off this hardware and everything crawls for two minutes, or implodes because of that delay, now it's just bad design.


> 
>> Anyway, the idea mdadm users can't benefit from shorter ERC is untrue. They certainly can. But the open question is why would they be getting such long error recovery times in the first place? 7 seconds is a long time.
> 
> One thing is "benefit", i.e. just a comfort issue, to prevent pauses when a
> drive starts failing (but really… does that happen very often?

Yes. And obviously with a drive that delays this by 2 minutes it's going to be an increasing problem. Any wonder why "crazy" home users with these drives refer to determinalistic contraptions as "wow the computer is really slow lately" and then some geek blames it on fragmentation. It would not surprise me if these drives are in deep recovery for critical sectors.

If this were untrue, there'd be no need for short recovery times. As it turns out it's a problem for any remotely serious RAID usage.


> and during
> those times do you really care so that there are no pauses for error recovery,
> or maybe you just want to replace the drive and still have your data safe?),
> and another thing is a complete RAID failure that you'd get with a
> hardware RAID controller simply because something is not within 7 seconds,
> which is the whole source of that vendor-supported myth of "non-RAID drives".

Yeah a controller that totally fails a drive for a delayed recovery of one to a few sectors? Bad sysadmin. He picked the wrong drive or he picked the wrong drive timeout in the controller. Either way, it's a mismatch and he's a bad sysadmin if it's important data and not stolen movies off the internet.

Chris Murphy

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux