Re: Filesystem corruption on RAID1

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 13 Jul 2017 17:35:12 +0200
Gionatan Danti <g.danti@xxxxxxxxxx> wrote:

> Jul 10 03:24:01 nas kernel: ata1.00: failed command: READ FPDMA QUEUED

Failed reads are not as bad, as they are just retried.

> Jul 12 03:14:41 nas kernel: ata1.00: failed command: WRITE FPDMA QUEUED

But these WILL cause incorrect data written to disk, in my experience. After
that, one of your disks will contain some corruption, whether in files, or (as
you discovered) in the filesystem itself. mdadm may or may not read from that
disk, as it chooses the mirror for reads pretty much randomly, using the least
loaded one. And even though the other disk still contains good data, there is
no mechanism for the user-space to say "hey, this doesn't look right, what's
on the other mirror?"

Check your cables and/or disks themselves.

If you know that only one disk had these write errors all the time, you could
try disconnecting it from mirror, and checking if you can get a more
consistent view of the filesystem on the remaining one.

P.S: about my case (which I witnessed on a RAID6):

  * copy a file to the array, one disk will hit tons of WRITE FPDMA QUEUED
    errors (due to insufficient power and/or bad data cable).
  * the file that was just copied, turns out to be corrupted when reading back.
  * the problem disk WILL NOT get kicked from the array during this.

-- 
With respect,
Roman
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux