Hey Piergiorgio, On Saturday, June 30, 2012 01:48:31 PM Piergiorgio Sartor wrote: > > the tool currently can detect failure of a single slot, and it > > could automatically repair that, I chose to make repair an > > explicit action. In fact, even the slice number and the two slots > > to repair are given via the command line. > > > > So for example, given this output of raid6check (check mode): > > Error detected at 1: possible failed disk slot: 5 --> /dev/sda1 > > Error detected at 2: possible failed disk slot: 3 --> /dev/sdb1 > > Error detected at 3: disk slot unknown > > > > To regenerate 1 and 2, run: > > raid6check /dev/md0 repair 1 5 3 > > raid6check /dev/md0 repair 2 5 3 > > (the repair arguments require you to always rebuild two blocks, > > one of which should result in a noop in these cases) > > Why always two blocks? The reason is simply to have less cases to handle in the code. There's already three ways to regenerate regenerate two blocks (D&D, D/P&Q and D&P), and there would be two more cases if only one block was to be repaired. With the original patch, if you can repair two blocks, that allows you to repair one (and one other in addition) as well. > > Since for stripe 3, two slots must be wrong, the admin has to > > provide a > Well, "unknown" means it is not possible to detect > which one(s). > It could be there are more than 2 corrupted. > The "unknown" case means that the only reasonable thing > would be to rebuild the parities, but nothing more can > be said about the status of the array. > > Nevertheless, there is a possibility which I was thinking > about, but I never had time to implement (even if the > software has some already built-in infrastructure for it). > Specifically, a "vertical" statistic. > That is, if there are mismatches, and, for example, 90% of > them belong to /dev/sdX, and the rest 10% are "unknown", > then it could be possible to extrapolate that, for the > "unknown", /dev/sdX must be fixed anyway and then re-check > if the status is still "unknown" or some other disk shows > up. If one disk is reported, then it could be fixed. > Other cases, the parity must be adjusted, whatever this > means in terms of data recovery. > > Of course, this is just a statistical assumption, which > means a second, "aggressive", option will have to be > available, with all the warnings of the case. As you point out, it is impossible to determine which of two failed slots are in error. I would leave such decision to an admin, but giving one or more "advices" may be a nice idea. Personally, I am recovering from a simultaneous three-disk failure on a backup storage. My best hope was to ddrescue "most" from all three disks onto fresh ones, and I lost a total of a few KB on each disk. Using the ddrescue log, I can even say which sectors of each disk were damaged. Interestingly, two disks of the same model failed on the very same sector (even though they were produced at different times), so I now have "unknown" slot errors in some stripes. But with context information, I am certain I know which slots need to be repaired. > > guess (and could iterate guesses, provided proper stripe backups): > > raid6check /dev/md0 repair 3 5 3 > > Actually, this could also be an improvement, I mean > the possibility to backup stripes, so that other, > advanced, recovery could be tried and reverted, if > necessary. That is true. I was thinking about this too. Unfortunately, as I remember, the functions to save and restore stripes in restripe.c do not save P and Q, which we should in order to redo the data block calculation. But with stripe backups, one could even imagine doing verifications on upper layers -- such as verifying file(system) checksums. I may send another patch implementing this, but I wanted to get general feedback on inclusion of such changes first (Neil?). > Finally, someone should consider to use the optimized > raid6 code, from the kernel module (can we link that > code directly?), in order to speed up the check/repair. I am a big supporter of getting it to work, then make it fast. Since a full raid check takes the magnitude of hours anyway, I do not mind that repairing blocks from the user space will take five minutes when it could be done in 3. That said, I think the faster code in the kernel is warranted (as it needs this calculation very often when a disk is failed), and if it is possible to reuse easily, we sure should. Cheers, Robert
Attachment:
signature.asc
Description: This is a digitally signed message part.