RE: Bad blocks are killing us!

dean gaudet <dean-list-linux-raid@xxxxxxxxxx> · Tue, 16 Nov 2004 15:29:16 -0800 (PST)

i've been trying to puzzle out a related solution... suppose there were 
"lock/unlock stripe" operations.  md delays all i/o to a locked stripe 
indefinitely (or errors out after a timeout, both solutions work).

a userland daemon could then walk through an array locking stripes 
performing whatever corrective actions it desires (raid5/6 reconstruction, 
raid1 reconstruction using "voting" for >2 ways).

there are some deadlock scenarios which the daemon must avoid with 
mlockall() and by having static memory allocation requirements.

that gives us a proactive detect/repair solution... maybe it also gives us 
a reactive solution:  when a read error occurs, md could lock the stripe, 
wake the daemon, let it repair and unlock.

there's good and bad aspects to this idea... i figured it's worth 
mentioning though.

-dean
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html