Re: [PATCH V3 08/13] md: set MD_CHANGE_PENDING in a spinlocked region

Shaohua Li <shli@xxxxxxxxxx> · Fri, 29 Apr 2016 11:21:37 -0700



On Fri, Apr 29, 2016 at 11:26:32AM +1000, NeilBrown wrote:
> On Thu, Apr 28 2016, Shaohua Li wrote:
> 
> > On Wed, Apr 27, 2016 at 10:55:43PM -0400, Guoqing Jiang wrote:
> >> 
> >> 
> >> On 04/27/2016 11:27 AM, Shaohua Li wrote:
> >> >On Tue, Apr 26, 2016 at 09:56:26PM -0400, Guoqing Jiang wrote:
> >> >>Some code waits for a metadata update by:
> >> >>
> >> >>1. flagging that it is needed (MD_CHANGE_DEVS or MD_CHANGE_CLEAN)
> >> >>2. setting MD_CHANGE_PENDING and waking the management thread
> >> >>3. waiting for MD_CHANGE_PENDING to be cleared
> >> >>
> >> >>If the first two are done without locking, the code in md_update_sb()
> >> >>which checks if it needs to repeat might test if an update is needed
> >> >>before step 1, then clear MD_CHANGE_PENDING after step 2, resulting
> >> >>in the wait returning early.
> >> >>
> >> >>So make sure all places that set MD_CHANGE_PENDING are protected by
> >> >>mddev->lock.
> >> >>
> >> >>Reviewed-by: NeilBrown <neilb@xxxxxxxx>
> >> >>Signed-off-by: Guoqing Jiang <gqjiang@xxxxxxxx>
> >> >>---
> >> >>V3 changes:
> >> >>1. use spin_lock_irqsave/spin_unlock_irqrestore in error funcs and
> >> >>    raid10's __make_request
> >> >shouldn't other places use spin_lock_irq/spin_unlock_irq? interrupt can occur
> >> >after you do spin_lock(), and if it's md_error, we deadlock.
> >> 
> >> It could possible in theory if func was interrupted by md_error after it
> >> called spin_lock,
> >> but seems lots of place in md.c also use spin_lock/unlock for mddev->lock,
> >> take
> >> md_do_sync and md_update_sb as example, both of them used
> >> spin_lock(&mddev->lock)
> >> and spin_unlock(&mddev->lock) before.
> >> 
> >> So I guess it will not cause trouble, otherwise, then we need to change all
> >> the usages of
> >> spin_lock/unlock(&mddev->lock), or introduce a new lock for this scenario. I
> >> am not sure
> >> which one is more acceptable.
> >
> > It doesn't cause trouble, because no interrupt/bh uses lock before. But now we
> > use it in softirq, that's the difference. Please enable lockdep, I think it
> > will complain. either we change all the locking to irq save or introducing a
> > new lock. either is ok.
> 
> Thanks for catching this!
> 
> As you say, the straight forward solution is change all the current
>   spin_lock(&mddev->lock);
> to
>   spin_lock_irq(&mddev->lock);
> 
> and similar for unlock.  Except where it can be called from interrupts
> we have to use spin_lock_irqsave().
> 
> There is another option that occurs to me.  Not sure if it is elegant or
> ugly, so I'm keen to see what you think.
> 
> In the places where were set MD_CHANGE_PENDING and one other bit -
> either MD_CHANGE_DEVS or MD_CHANGE_CLEAN - we could use set_mask_bits
> to set them both atomically.
> 
>   set_mask_bits(&mddev->flags, BIT(MD_CHANGE_PENDING) | BIT(MD_CHANGE_DEVS));
> 
> Then in md_update_sb, when deciding whether to loop back to "repeat:" we
> use a new "bit_clear_unless".
> 
> #define bit_clear_unless(ptr, _clear, _test)			\
> ({								\
> 	const typeof(*ptr) clear = (_clear), test = (_test);	\
> 	typeof(*ptr) old, new;					\
> 								\
> 	do {							\
> 		old = ACCESS_ONCE(*ptr);			\
> 		new = old & ~clear;				\
> 	} while (!(old & test) && cmpxchg(ptr, old, new) != old);\
> 								\
> 	!(old & test);							\
> })
> 
> The code in md_update_sb() would be
> 
>  if (mddev->in_sync != sync_req ||
>      !bit_clear_unless(&mddev->flags, BIT(MD_CHANGE_PENDING),
>                        BIT(MD_CHANGE_DEVS)|BIT(MD_CHANGE_CLEAN)))
>         goto repeat;
> 
> 
> So if either DEV or CLEAN we set, PENDING would not be cleared and the
> code would goto repeat.
> 
> What do you think?

Looks great, I like this idea. Thanks!
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html