Re: [PATCH V3 08/13] md: set MD_CHANGE_PENDING in a spinlocked region

NeilBrown <neilb@xxxxxxx> · Fri, 29 Apr 2016 11:26:32 +1000

On Thu, Apr 28 2016, Shaohua Li wrote:

> On Wed, Apr 27, 2016 at 10:55:43PM -0400, Guoqing Jiang wrote:
>> 
>> 
>> On 04/27/2016 11:27 AM, Shaohua Li wrote:
>> >On Tue, Apr 26, 2016 at 09:56:26PM -0400, Guoqing Jiang wrote:
>> >>Some code waits for a metadata update by:
>> >>
>> >>1. flagging that it is needed (MD_CHANGE_DEVS or MD_CHANGE_CLEAN)
>> >>2. setting MD_CHANGE_PENDING and waking the management thread
>> >>3. waiting for MD_CHANGE_PENDING to be cleared
>> >>
>> >>If the first two are done without locking, the code in md_update_sb()
>> >>which checks if it needs to repeat might test if an update is needed
>> >>before step 1, then clear MD_CHANGE_PENDING after step 2, resulting
>> >>in the wait returning early.
>> >>
>> >>So make sure all places that set MD_CHANGE_PENDING are protected by
>> >>mddev->lock.
>> >>
>> >>Reviewed-by: NeilBrown <neilb@xxxxxxxx>
>> >>Signed-off-by: Guoqing Jiang <gqjiang@xxxxxxxx>
>> >>---
>> >>V3 changes:
>> >>1. use spin_lock_irqsave/spin_unlock_irqrestore in error funcs and
>> >>    raid10's __make_request
>> >shouldn't other places use spin_lock_irq/spin_unlock_irq? interrupt can occur
>> >after you do spin_lock(), and if it's md_error, we deadlock.
>> 
>> It could possible in theory if func was interrupted by md_error after it
>> called spin_lock,
>> but seems lots of place in md.c also use spin_lock/unlock for mddev->lock,
>> take
>> md_do_sync and md_update_sb as example, both of them used
>> spin_lock(&mddev->lock)
>> and spin_unlock(&mddev->lock) before.
>> 
>> So I guess it will not cause trouble, otherwise, then we need to change all
>> the usages of
>> spin_lock/unlock(&mddev->lock), or introduce a new lock for this scenario. I
>> am not sure
>> which one is more acceptable.
>
> It doesn't cause trouble, because no interrupt/bh uses lock before. But now we
> use it in softirq, that's the difference. Please enable lockdep, I think it
> will complain. either we change all the locking to irq save or introducing a
> new lock. either is ok.

Thanks for catching this!

As you say, the straight forward solution is change all the current
  spin_lock(&mddev->lock);
to
  spin_lock_irq(&mddev->lock);

and similar for unlock.  Except where it can be called from interrupts
we have to use spin_lock_irqsave().

There is another option that occurs to me.  Not sure if it is elegant or
ugly, so I'm keen to see what you think.

In the places where were set MD_CHANGE_PENDING and one other bit -
either MD_CHANGE_DEVS or MD_CHANGE_CLEAN - we could use set_mask_bits
to set them both atomically.

  set_mask_bits(&mddev->flags, BIT(MD_CHANGE_PENDING) | BIT(MD_CHANGE_DEVS));

Then in md_update_sb, when deciding whether to loop back to "repeat:" we
use a new "bit_clear_unless".

#define bit_clear_unless(ptr, _clear, _test)			\
({								\
	const typeof(*ptr) clear = (_clear), test = (_test);	\
	typeof(*ptr) old, new;					\
								\
	do {							\
		old = ACCESS_ONCE(*ptr);			\
		new = old & ~clear;				\
	} while (!(old & test) && cmpxchg(ptr, old, new) != old);\
								\
	!(old & test);							\
})

The code in md_update_sb() would be

 if (mddev->in_sync != sync_req ||
     !bit_clear_unless(&mddev->flags, BIT(MD_CHANGE_PENDING),
                       BIT(MD_CHANGE_DEVS)|BIT(MD_CHANGE_CLEAN)))
        goto repeat;

So if either DEV or CLEAN we set, PENDING would not be cleared and the
code would goto repeat.

What do you think?

NeilBrown
Attachment:
signature.asc

Description: PGP signature