Re: What are mdadm maintainers to do? (error recovery redundancy/data loss)

Chris <email.bug@xxxxxxxx> · Tue, 17 Feb 2015 23:33:26 +0000 (UTC)

Chris Murphy writes:

> 
> It's not just mdadm. It likewise affects Btrfs, ZFS, and LVM.

Do they have own timouts, or rely on the kernel?
Maybe the kernel could read the SCTERT value from the drives (in lieu of
some better retry timout information, and set the controller timout a little
greater than that, or very large if SCTERT is disabled/not available.

> sda1 and sdb1 are raid0, and sda2 and sdb2 are
> raid1. What's the proper configuration for SCT ERC and the SCSI
> command timer?

guessing...

For SCTERT disabled drives:
A compromise may be to stay with the linux default controller timout, it's
30s, and set the drives SCTERT below 30s (maybe 27s), to avoid losing
redundancy and risking data loss *AND* allow more of the available time for ERC.

For longer error correcting attempts (and just as long i/o controller
blocking!) the contoller timout could be set to 180s, and SCTERT to 175s?

BUT: If I chose to use a raid0 alongside a redundant raid I already
explicitly decided to take all data loss the hardware throws at me. So I
don't think it makes much of a difference if ERC times out after <30 secs or
180s, its just more or less errors belonging to me.

For SCTERC enabled drives:
30s and 7s seems ok?

> *shrug* I don't think the automatic udev configuration idea is fail
> safe. It sounds too easy for it to automatically cause a
> misconfiguration.

A matching timeout configuration prevents that unavoidable unrecoverable
read error take down the redundancy for sure, and cause high risk of data
loss during rebuild.

It does fix a misconfiguration, however could possibly set SCTERT just below
the (30s) controler timout, to reduce the impact of SCTERT (e.g make use of
the small chance of error correction succceding a couple of seconds later).
Given the longer SCTERT timout does not lead to subseqent read error timouts
piling up.

> And it also doesn't at all solve the problem that
> there's next to no error reporting to user space.

That is correct, but rather not related to the importance to fix the timout
mismatch and reduce the risk, is it? The settings do solve unecessary loss
of redundancy on read errors that are sure to occur, unnecessary resyncing,
and high risk of data loss during all that.

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html