Re: [PATCH] MD: Do not increment resync_mismatches unless MD_RECOVERY_REQUESTED

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, 22 Apr 2013 11:26:16 -0500 Brassow Jonathan <jbrassow@xxxxxxxxxx>
wrote:

> 
> On Apr 21, 2013, at 7:36 PM, NeilBrown wrote:
> 
> > On Fri, 19 Apr 2013 15:09:21 -0500 Jonathan Brassow <jbrassow@xxxxxxxxxx>
> > wrote:
> > 
> >> MD:  Do not increment resync_mismatches unless MD_RECOVERY_REQUESTED
> >> 
> >> resync_mismatches is used to display the number of differences that
> >> have been found or repaired during a scrubbing operation.  It is not
> >> meant to count anything during resync or repair operations.  (How
> >> much sense does it make to find resync_mismatches populated after an
> >> initial synchronization of the array?  After cleaning-up an unclean
> >> shutdown?  After [re]integrating a device into an existing array?)
> >> The incrementing of the variable must be restricted to when the user
> >> initiates a scrubbing operation (i.e. "check" or "repair").
> > 
> > How do you know what it is "meant" to do? :-)
> 
> Yes, I suppose I did infer the meaning, but I don't think it's too much of a stretch - especially given the commit message where 'resync_mismatches' was introduced.
> commit 9d88883e68f404d5581bd391713ceef470ea53a9
> Author: NeilBrown <neilb@xxxxxxx>
> Date:   Tue Nov 8 21:39:26 2005 -0800
> 
>     [PATCH] md: teach raid5 the difference between 'check' and 'repair'.
>     
>     With this, raid5 can be asked to check parity without repairing it.  It also
>     keeps a count of the number of incorrect parity blocks found (mismatches) an
>     reports them through sysfs.
>     
>     Signed-off-by: Neil Brown <neilb@xxxxxxx>
>     Cc: Greg KH <greg@xxxxxxxxx>
>     Signed-off-by: Andrew Morton <akpm@xxxxxxxx>
>     Signed-off-by: Linus Torvalds <torvalds@xxxxxxxx>
> Also, it seems that 'mismatches' is only ever talked about in reference to scrubbing operations.  (Although it does say in 'man 4 md', "This  is  set to zero when a scrub starts and is incremented whenever a sector is found that is a mismatch.", which could imply that 'mismatches' might be non-zero before a scrub operation starts.)  There is no indication anywhere that I can find what it would mean to have a non-zero 'mismatches' absent a scrub operation - perhaps we just assume "uninitialized".
> 
> However, if it has come to mean something more since then or provides any sort of useful information, then I can see your point.  I find the behavior to be problematic.  A user creates an array and sometimes the value is non-zero - that could be alarming, causing the user to think the array needs immediate "repair".
> 
> Anyway, I suppose I am simply bloviating if your mind is made up.  I can make due with the idea below of recording the "last_sync_action".  I'm thinking of recording the MD_RECOVERY_* flags that would define the type of sync_action - would you rather have a 'char *'?  If we kept the flags, I wonder if we could also use that to define acceptable sync_action transitions...  We don't, for example, particularly want "resync" -> "idle" -> "check" -> "idle" -> "resync" - especially if checkpoints are being made throughout.
> 
> > While it might not be particularly useful, I see no point in hiding
> > information, and no desire to change what mismatch_cnt reports.
> > 
> > It may well be useful to somehow report what the real meaning for
> > mismatch_cnt is.  i.e. to report what the last sync_action was.
> > Then any use-space tool that reported mismatch_cnt, could adjust according to
> > whether the last operation was check/repair or resync/recovery/reshape. etc.
> > 
> > So if you were to provide a patch which adds "last_sync_action" or similar,
> > I'd certainly consider that.
> 
>  brassow

"bloviating" : Talk at length, esp. in an inflated or empty way.

I learnt a new word today - thanks!

On reflection, not all RAID personalities count mismatches for resync.  RAID1
doesn't (so your change to raid1.c was unnecessary).
So assuming that it means anything after a 'sync' is already invalid.

You are right that people get confused by it, but they get confused that it
is non-zero after a "repair" mostly, and we cannot "fix" that.

I think the really useful thing here is to record the cause of the
mismatch_cnt - why we even counted mismatches.  I'd probably record it as a
string but I don't think it matter much either way: whatever makes the code
simple is good.

And if we do record the action which generated mismatch_cnt, then having the
mismatch_cnt file appear empty when the action was not check or repair is
probably valid after all.

I don't think there is any extra need to enforce sync_action transitions.
If you write "idle" while a sync is needed, it will interrupt the sync then
immediately restart it again.
You could possible write "frozen" and then "check".  That might do something
odd, and possibly should  be fixed.
However the choice of next action doesn't depend on what has been done
previously, but on the current state of the array.  So there is no need to
record extra state, but there might be a need to validate some transitions,
particularly involving 'frozen'.

NeilBrown

Attachment: signature.asc
Description: PGP signature


[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux