Re: [PATCH] DM RAID: Add ability to restore transiently failed devices on resume

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 11 Apr 2013 15:27:03 -0500 Jonathan Brassow <jbrassow@xxxxxxxxxx>
wrote:

> DM RAID: Add ability to restore transiently failed devices on resume
> 
> This patch adds code to the resume function to check over the devices
> in the RAID array.  If any are found to be marked as failed and their
> superblocks can be read, an attempt is made to reintegrate them into
> the array.  This allows the user to refresh the array with a simple
> suspend and resume of the array - rather than having to load a
> completely new table, allocate and initialize all the structures and
> throw away the old instantiation.
> 
> Signed-off-by: Jonathan Brassow <jbrassow@xxxxxxxxxx>

Is this really a good idea?  
Just because you can read the superblock, that doesn't mean you can read any
other block in the array.

If the auto-recovery were optional I might consider it, but having it be
enforced on every suspend/resume just doesn't seem safe to me.

Have I misunderstood?

Is this that hard to achieve in user-space?


> +			if (test_bit(Faulty, &r->flags) && r->sb_page &&
> +			    !read_disk_sb(r, r->sb_size)) {

I know it is fairly widely used, especially for 'strcmp', but I absolutely
despise this construct.
  !read_disk_sb()
sound like
  "not read_disk_sb" or "could not read the disk's superblock" but it
actually means "did not fail to read the disk's superblock" - the exact
reverse.
If 0 means success, I like to see an explicit test for 0:
   read_disk_sb() == 0

NeilBrown

Attachment: signature.asc
Description: PGP signature


[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux