Re: [PATCH] md: Protecting mddev with barriers to avoid races

"NeilBrown" <neilb@xxxxxxx> · Fri, 22 May 2009 21:33:53 +1000 (EST)

On Fri, May 22, 2009 7:46 pm, SandeepKsinha wrote:
> Hi Neil,
>
>
> On Fri, May 22, 2009 at 2:28 PM, NeilBrown <neilb@xxxxxxx> wrote:
>> On Fri, May 22, 2009 6:36 pm, Andre Noll wrote:
>>> On 13:48, SandeepKsinha wrote:
>>>> Signed-off-by: Sandeep K Sinha <sandeepksinha@xxxxxxxxx>
>>>> Date:   Fri May 22 13:46:56 2009 +0530
>>>>
>>>>     Protecting mddev with barriers to avoid races.
>>>>
>>>> diff --git a/drivers/md/linear.c b/drivers/md/linear.c
>>>> index 9ad6ec4..e2dbda8 100644
>>>> --- a/drivers/md/linear.c
>>>> +++ b/drivers/md/linear.c
>>>> @@ -28,10 +28,12 @@
>>>>  static inline dev_info_t *which_dev(mddev_t *mddev, sector_t sector)
>>>>  {
>>>>         int lo, mid, hi;
>>>> -       linear_conf_t *conf = mddev_to_conf(mddev);
>>>> +       linear_conf_t *conf;
>>>>
>>>>         lo = 0;
>>>>         hi = mddev->raid_disks - 1;
>>>> +       barrier();
>>>> +       conf = mddev_to_conf(mddev);
>>>
>>> Are you sure that a compiler barrier is sufficient here? Maybe we
>>> need smp_mb(), but I'm not familar with all the different memory
>>> barrier types that exist in the kernel.
>>
>> I think we just want a read barrier here:
>>    smb_rmb();
>> because it is a barrier between reading mddev->raid_disk and
>> reading mddev->private.
>>
>> In linear_add, we want a write barrier
>>    smb_wmb();
>> before setting mdev->raid_disks, and I think we also want one
>> before setting mddev->private to make sure that all the assignments
>> to newconf->whatever are stable.
>
> Do we really need one before setting mddev->private? Will newconf be
> used by anyone before its assigned to mddev->private?
> No. Then what makes you put a write barrier before it?

I'm not 100% certain we do need it, but the logic goes like that:

One processor does
  newconf->disks[0].end_sector = X;
  mddev->private = newconf;

and another processor does
   conf = mddev->private;
   foo = conf->disk[0].end_sector;

Due to the lack of memory ordering guarantees, foo could end up
with the value of end_sector *before* X was assigned to it.

In fact, rcu_assign_pointer (in include/linux/rcupdate.h) has
exactly this purpose: place an smp_wmb before assigning the address
of a structure to a pointer, to ensure that the content of the
structure is globally visible before the address of the structure
become visible.
So I would probably do

   rcu_assign_pointer(mddev->private, newconf);
   smb_wmb();
   mddev->raid_disks ++;

An alternative (which actually might be an improvement, I'm not
sure) is to have a conf->raid_disks field and use that value in
which_dev.  Then it would be more like

   mddev->raid_disk++;
   newconf->raid_disk = mddev->raid_disks;
   rcu_assign_pointer(mddev->private, newconf);

and we only have one memory barrier.
The which_dev code then becomes:

   conf = rcu_dereference(mddev->private);
   hi = conf->raid_disks = 1;
   .....

and again we have only one memory barrier.  The significant benefit
of this is that instead of using the low level smp_[rw]mb() operations,
we use the higher level rcu_* interfaces which make it a lot more
obvious what we are doing.

It would then be a very small step use use rcu_dereference for
all mddev->private accesses in linear.c, wrap them in rcu_read_{,un}lock,
and then use call_rcu to arrange for the old conf_t to be freed.

NeilBrown

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html