Re: smart short test crashes software raid array?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 11/3/19 2:10 am, Wols Lists wrote:
On 10/03/19 11:14, Reindl Harald wrote:
Am 10.03.19 um 10:55 schrieb Andy Smith:

On Sat, Mar 09, 2019 at 11:53:22PM +0100, Reindl Harald wrote:
Am 09.03.19 um 23:32 schrieb Wols Lists:
Well, my first take on that is that they are NOT raid-quality drives!!!
when i hear such shit i frankly could puke!
I suspect that Wols means that these drive models cannot set SCTERC so
will retry for a very long time, requiring the block layer timeouts to
be set to 2+ minutes or else risk drive being kicked out by the kernel
whenever there is a minor problem.

If so, this a factual thing, in that manufacturers really did produce
drives that are sub-optimal for RAID.
no, the problem is that you need to change that timeouts because of bad
defaults

Which is why I said in my original email that you need to make sure the
timeout script runs ...

These drives *are* sub-optimal in that (a) they are unfit for raid use
"out of the box", and (b) they cannot be configured suitably for such use.


I think it is part (b) of your comment that is incorrect. They *can* be configured (by the OS on every boot) for use in a Linux software RAID, but this resulting array will have different characteristics compared to what might be expected. So any user configuring their OS in this way should be aware of how this will impact the system/array, and what to expect.


You have to muck about with the OS *every* *boot*, and the changes are
such that if there is a problem the machine will appear to hang because
it takes something like two to three minutes to sort itself out. This is
painful on a desktop, and intolerable on a server, if your process hangs
that long waiting for a read to complete.

It depends on what your "server" is doing. For some "servers" this wouldn't be an issue at all.

For a backup server where online access is not important, these drives
are okay. For systems where you have users expecting a fast response,
they are not.

Right, so in some scenarios, they are fine, and in others they are not. Pick the hardware that suits your requirements (or, change the requirements to suit the budget, etc..)

Personally, I stuck to the WD Black drives for a long time, recently I've been using the WD Red Pro, but then I've also had a *lot* of disk failures (on the WD Black's, lucky they had a 5 yr warranty). These days, I use a lot of SSD's for RAID, but it all depends on your budget/requirements.

I'd like to modify the raid layer such that it times out quickly, and
recalculates and rewrites the data after a few seconds, such that these
drives cease to be a problem,

I probably know a lot less here, but is this a RAID layer issue? Isn't it up to root to decide how long the timeout is (below the RAID layer, eg SATA or SCSI etc layer)? Ideally, you don't WANT the RAID layer to kick a "slow" disk, we don't know *why* it is slow, and as soon as we kick it, or make it do more work than it already is, then we risk making the "slow" problem even worse (ie, losing redundancy).


but stick that on the long list of raid
papercuts I'd like to sort out when I can find the time to learn to
program the raid subsystem!

I thought there were some moves to add "magic" udev scripts to detect and set timeouts on "bad" drives that were being used as RAID members. Did that ever finalise and get accepted by the various "main" OS distributions? I thought this was the better option (keep the drive in the array, even though all array operations will freeze for a few minutes, so we can keep redundancy and "everything magically works"). It was supposed to change the "My array isn't working and I lost all my data, and Linux software RAID is crap" -> "My array freezes for 2 or 3 minutes every couple of weeks, usually I don't notice, but its a little annoying when I do. Anyone know why"

Just my opinions above.... I don't post much on the list, but I try to read it and stay informed.

Regards,
Adam

--
Adam Goryachev Website Managers www.websitemanagers.com.au

--
The information in this e-mail is confidential and may be legally privileged.
It is intended solely for the addressee. Access to this e-mail by anyone else
is unauthorised. If you are not the intended recipient, any disclosure,
copying, distribution or any action taken or omitted to be taken in reliance
on it, is prohibited and may be unlawful. If you have received this message
in error, please notify us immediately. Please also destroy and delete the
message from your computer. Viruses - Any loss/damage incurred by receiving
this email is not the sender's responsibility.



[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux