Re: [PATCH 4/4 v4] mdmon: bad block support for external metadata - clear bad blocks

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 27 Oct 2016, Tomasz Majchrzak wrote:
> If an update of acknowledged bad blocks file is notified, read entire
> bad block list from sysfs file and compare it against local list of bad
> blocks. If any obsolete entries are found, remove them from metadata.
> 
> As mdmon cannot perform any memory allocation, new superswitch method
> get_bad_blocks is expected to return a list of bad blocks in metadata
> without allocating memory. It's up to metadata handler to allocate all
> required memory in advance.

hi,
 only 5 years late to this party :-)

 I recently had cause the look at this code and ... there are problems.

 Primarily, it assumes that the "bad_blocks" file contains a complete
 list of bad blocks known to the kernel.  This is not correct.
 As the documentation and nearby comments say, the contents of this file
 is truncated to PAGE_SIZE.  It is not meant to be a complete list, only
 an indicative list.
 There is no way to get a complete list from the kernel once the list
 gets too large.  Probably we should design and implement a reliable way
 to extract this information.  I imagine it would be something like
 unacknowledged_bad_blocks, in that mdmon could read some information,
 then confirm that it has been read, then read some more.  But until
 that it done, this code should be careful not to assume that the list
 is complete - at least not without checking.

 Secondly, the interface with the metadata handler is a bit odd.
 The 'check_for_cleared_bb' essentially does:
   - call ->record_bad_block  for all blocks known to the kernel
   - call ->clear_bad_block   for all blocks that were in the metadata
  in that order.
  It isn't quite that simple as there are optimisations:
    if a range from kernel exactly matches a range in metadata, the
      range is neither recorded or cleared
    If a range from the kernel is a subrange of a range in metadata,
      then the larger range is cleared before the new range is added,
      AS WELL AS after.

  If there are other overlaps, then the kernel range is recorded
  before the metadata range is cleared.  This *seems* wrong.  I would
  expect this to clear part of the range that had just been added.

  However, it doesn't.  The ->clear_bad_block interface *DOESN'T* remove
  all the block in the range from the bbl.  Rather, if the exact range
  given appears as one of the ranges in the bbl, then that range is
  deleted.  Otherwise no change happens.
  These semantics are surprising.  The net result is that the code
  probably works with the imsm backend.  However if someone else wrote a
  different backend which implemented ->clear_bad_block to actually
  remove the entire range from the bbl, then it would clear more blocks
  than it should.

  I think it would be really good to re-implement this code in a way
  that was more maintainable.
  I don't think "check_for_cleared_bb()" should *ever* record new bad
  block ranges.  They get recorded through the unacknowledged_bad_block
  processing.  "check_for_cleared_bb()" should ONLY delete blocks from the
  bbl, and it should ONLY do that if it certain that the information in
  "bad_blocks" is complete.

  It should read bad_blocks in a single read().  If the returned data
  ends with a newline, and is not a power-of-2 in size, then it is
  safe to assume that it is complete.
  If it doesn't end with a newline, then it is definitely not complete.
  If it is a power-of-2 less than 4096, then it can be assumed to be
  complete.  If it is exactly 4096 bytes, or a larger power of two, then
  it is not safe to assume that it is complete.

Thanks,
NeilBrown



[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux