Re: Mdadm server eating drives

Mikael Abrahamsson <swmike@xxxxxxxxx> · Tue, 18 Jun 2013 06:13:42 +0200 (CEST)

On Mon, 17 Jun 2013, Barrett Lewis wrote:

I did notice that before rsync found one of the differences (corrupt 
files) it started spitting out those same "failed command: READ FPDMA 
QUEUED status: { DRDY ERR } error: { UNC }" errors as before but this 
time it did not fail the drive.  I take this to mean there is still some 
physical problems with the drive, but with the new timeout settings it 
is not unnecessarily failing the drive out of the array.  So if I 
overwrite the corrupted files with the backups, (or write any new data 
to the array really), will it avoid those problem areas on the platter?

What should have happened here is that when md received the read error it 
should have read parity and recalculculated what should have been on those 
read error sectors and written to them, and the drive should have either 
succeeded in writing the new information, or written them to another place 
(reallocation).

If your system is now working well, it might make sense to issue a 
"repair" to the array and let it run through completely:

echo repair > /sys/block/md0/md/sync_action

--
Mikael Abrahamsson    email: swmike@xxxxxxxxx
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html