Il 13-07-2017 18:48 Roman Mamedov ha scritto:
Failed reads are not as bad, as they are just retried.
I agree, I reported them only to give a broad picture of the system
state :)
Jul 12 03:14:41 nas kernel: ata1.00: failed command: WRITE FPDMA
QUEUED
But these WILL cause incorrect data written to disk, in my experience.
After
that, one of your disks will contain some corruption, whether in files,
or (as
you discovered) in the filesystem itself.
This is the "scary" part: if the write was not acknowledged as committed
to disk, why the block layer did not report it to the MD driver? Or if
the block layer reported that, why MD did not kick the disk out of the
array?
mdadm may or may not read from that
disk, as it chooses the mirror for reads pretty much randomly, using
the least
loaded one. And even though the other disk still contains good data,
there is
no mechanism for the user-space to say "hey, this doesn't look right,
what's
on the other mirror?"
I understand and agree with that. I'm fully aware that MD can not (by
design) detect/correct corrupted data. However, I wonder if, and why, a
disk with obvious errors was not kicked out of the array.
Check your cables and/or disks themselves.
I tried reseating and inverting the cables ;)
Let see if the problem disappears or if it "follow" the
cable/drive/interface...
If you know that only one disk had these write errors all the time, you
could
try disconnecting it from mirror, and checking if you can get a more
consistent view of the filesystem on the remaining one.
P.S: about my case (which I witnessed on a RAID6):
* copy a file to the array, one disk will hit tons of WRITE FPDMA
QUEUED
errors (due to insufficient power and/or bad data cable).
* the file that was just copied, turns out to be corrupted when
reading back.
* the problem disk WILL NOT get kicked from the array during this.
Wow, a die-hard data corruption. It seems VERY similar to what happened
to me, and the key problem seems the same: a failing drive was not
detached from the array in a timely fashion.
Thanks very much for reporting, Roman.
--
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: g.danti@xxxxxxxxxx - info@xxxxxxxxxx
GPG public key ID: FF5F32A8
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html