md failing mechanism

Dark Penguin <darkpenguin@xxxxxxxxx> · Fri, 22 Jan 2016 20:59:45 +0300

Greetings,

Recently, I've had my first drive failure in a software RAID1 on a file 
server. And I was really surprised about exactly what happened; I always 
thought that when md can't process a read request from one of the 
drives, it is supposed to mark that drive as faulty and read from 
another drive; but, for some reason, it was deliberately trying to read 
from a faulty drive no matter what, which apparently caused Samba to 
wait until it's finished, and so the whole server was rendered 
inaccessible (I mean, the whole Samba).

What I expected:
- A user tries to read a file via Samba.
- Samba issues a read request to md.
- md tries to read the file from one of the drives... the drive is 
struggling to read a bad sector...
- md thinks: okay, this is taking too long, production is not waiting; 
I'll just read from another drive instead.
- It reads from another drive successfully, and users continue their work.
- Finally, the "bad" drive gives up on trying to read the bad sector and 
returns an error. md marks the drive as faulty and sends an email 
telling me to replace the drive as soon as possible.

What happened instead:
- A user tries to read a file via Samba.
- Samba issues a read request to md.
- md tries to read the file from one of the drives... the drive is 
struggling to read a bad sector... Samba is waiting for md, md is 
waiting for the drive, and the drive is trying again and again to read 
this blasted sector like its life depends on it, while users see that 
the network folder doesn't respond anymore at all.

This goes on forever, until users call me, I come to investigate, see 
Samba down, see a lot of errors in dmesg, and then I manually mark this 
drive as faulty.

Now, that happened a while ago; I did not have the most recent kernel on 
that server (I think it was 3.2 from Debian Wheezy or something a little 
newer from the backports), but I can't try it again with a new server, 
because I can't make a functional RAID1, write data there, and then 
destroy some sectors and see what happens. I just want to ask, is that 
really how it works?.. Was that supposed to happen?.. I thought the main 
point of a RAID1 is to avoid any downtime, especially in such cases!.. 
Or is it maybe a known issue fixed in the more recent versions, so I 
should just update my kernels and expect different behaviour next time?..

--
darkpenguin
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html