On Thu, Jan 28 2016, Sarah Newman wrote: > On 01/27/2016 07:19 PM, NeilBrown wrote: >> On Thu, Jan 28 2016, Sarah Newman wrote: >> >>> I experienced the following problems with the mdadm bad blocks list: >>> >>> 1. Additions to the bad block list do not cause an email to be sent by the mdadm monitor. Expected behavior is for an email to be sent as soon as the >>> bad blocks list becomes non-empty. >> >> Yes, that would be a good idea. If you do develop patches, please post >> them. > > Will do, but I don't have a definite time frame for it. > >> >>> 2. /proc/mdstat does not show any indication that there are bad blocks present on an md member. Specifically, the status for the raid personality >>> should show something other than "U" if the badblocks list is not empty for that member (maybe "B"?) >> >> I'd like to deprecate /proc/mdstat. It is not really easy to extend. >> People might have programs that parse it which could break if you change >> 'U' to 'B'. >> I'd recommend using "mdadm" to get status of an array, or examine file >> in /sys. > > If /proc/mdstat isn't going to be updated, is it going to be removed? If not and changing 'U' to 'B' isn't acceptable, then what about adding a flag > to the device? Example Removing is not better than changing. Legacy is a problem... > > md0 : active raid1 sda1[1] sdb1[2](B) That might be acceptable. There is precedent for that sort of change. > > Where is the bad blocks list in /sys? /sys/block/mdXXX/md/dev-YYY/bad_blocks > >> >>> 3. Adding a device when there is an md member with bad blocks does not appear to trigger a rebuild, meaning there could be at least one good copy of >>> all the data but no way to get all good data on a single device without expanding the entire array. >> >> Good point. That would be quite easy to change. Just set >> WantReplacement if the bad block list is ever empty. >> Not sure it is always a good idea though. You can have a bad block on a >> perfectly good device if the device it was recovered from has a bad >> block. >> You only really want to set WantReplacement automatically if a write >> fails. We do do that, but if you stop and restart an array the fact >> that a write failed can be forgotten. > > Yes, I am quite aware there can be a bad block on a perfectly good device. But in a mirror if there are multiple perfectly good devices that each have > bad blocks marked for whatever reason, the only way to get back to a single good device is to rebuild off of all of them. Speaking as a user, this is > what I would want to happen. Performing a "check" - e.g. echo check > /sys/block/mdXXX/md/sync_action should do that. I'm not certain that it does but it is an avenue worth exploring and possibly fixing. Running "check" on a regular basis is something everyone should do (there is a script in mdadm to help with this). > >> I'm not convinced that it is harmful, though I accept that it is not perfect. > > Yes. You both know the current behavior of mdadm perfectly and probably didn't just experience data loss. Fair comment. > > The old behavior was to fail immediately and alert if there was a problem rather than silently accepting errors. I expect there are some people who > think they have a good RAID, but don't, based on /proc/mdstat and lack of errors from mdadm monitor. > > Thanks, Sarah Getting feed back like this is an important part of making MD better! I'm unlikely to be coding any changes myself in the immediate future but I'm very happy to discuss them. Thanks, NeilBrown
Attachment:
signature.asc
Description: PGP signature