On Thu, 11 Apr 2013 15:27:03 -0500 Jonathan Brassow <jbrassow@xxxxxxxxxx> wrote: > DM RAID: Add ability to restore transiently failed devices on resume > > This patch adds code to the resume function to check over the devices > in the RAID array. If any are found to be marked as failed and their > superblocks can be read, an attempt is made to reintegrate them into > the array. This allows the user to refresh the array with a simple > suspend and resume of the array - rather than having to load a > completely new table, allocate and initialize all the structures and > throw away the old instantiation. > > Signed-off-by: Jonathan Brassow <jbrassow@xxxxxxxxxx> Is this really a good idea? Just because you can read the superblock, that doesn't mean you can read any other block in the array. If the auto-recovery were optional I might consider it, but having it be enforced on every suspend/resume just doesn't seem safe to me. Have I misunderstood? Is this that hard to achieve in user-space? > + if (test_bit(Faulty, &r->flags) && r->sb_page && > + !read_disk_sb(r, r->sb_size)) { I know it is fairly widely used, especially for 'strcmp', but I absolutely despise this construct. !read_disk_sb() sound like "not read_disk_sb" or "could not read the disk's superblock" but it actually means "did not fail to read the disk's superblock" - the exact reverse. If 0 means success, I like to see an explicit test for 0: read_disk_sb() == 0 NeilBrown
Attachment:
signature.asc
Description: PGP signature