problem statement regarding --re-add

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




Hi,

Ubuntu 16.04.4 LTS and the included mdadm with that distribution.

I just had quite the scare. I had a 10 disk RAID6 (9 drives plus 1 spare) have 4 drives kicked from it, due to cascading error that at the end made all drives connected to that controller get thrown out due to a single drive error. The array had write intent bitmap enabled.

After a while, xfs gave up in the resulting situation, and stopped writing to the filesystem. This resulted in all drives having slightly the wrong event count, but the difference was less than 100 between the different drives.

So what I did next, was to do --assemble --force on all 10 drives. This resulted in a situation where I had 7 drives plus 1 spare, and it started rebuilding. The problem was that one of the 7 drives was the drive that was initially failed, so the re-build failed (the drive seemed to have trouble sitting in a chassis, but was fine standalone, so perhaps it developed an vibration intolerance). I then tried to --re-add --force the 2 drives that for some reason wasn't included initially, but they just became spares, meaning it wiped the superblock on them and made them spares. So I still only had 6 working data drives, which is not enough.

What I ended up doing was to do ddrescue off of the drive that was failing, which yielded all information recovered. So I then proceeded to --assemble --force with 7 data drives (now that the data off of the bad drive was on a better drive), and then I could add one spare, rebuild, then add two remaining spares, and then I was back to normal. xfs_repair threw some files/dirs in lost+found, but overall it seems all files are fine so the order and layout was correct.

Now, to the reason I wrote problem statement in the subject line:

I would like a way to do --re-add and ignore the event count (--force), but it should refuse to add the drive if --re-add isn't fulfilled, meaning the drive should never have its role and rest of the superblock drastically changed (such as making it a spare). If it for some reason can't be re-added properly, it should be ignored and left alone. Perhaps I am asking to have a new option which would be --ignore-event-count instead of using the more generic "--force"?

--
Mikael Abrahamsson    email: swmike@xxxxxxxxx
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux