Re: [PATCH] md: Protecting mddev with barriers to avoid races

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, May 22, 2009 at 5:03 PM, NeilBrown <neilb@xxxxxxx> wrote:
> On Fri, May 22, 2009 7:46 pm, SandeepKsinha wrote:
>> Hi Neil,
>>
>>
>> On Fri, May 22, 2009 at 2:28 PM, NeilBrown <neilb@xxxxxxx> wrote:
>>> On Fri, May 22, 2009 6:36 pm, Andre Noll wrote:
>>>> On 13:48, SandeepKsinha wrote:
>>>>> Signed-off-by: Sandeep K Sinha <sandeepksinha@xxxxxxxxx>
>>>>> Date:   Fri May 22 13:46:56 2009 +0530
>>>>>
>>>>>     Protecting mddev with barriers to avoid races.
>>>>>
>>>>> diff --git a/drivers/md/linear.c b/drivers/md/linear.c
>>>>> index 9ad6ec4..e2dbda8 100644
>>>>> --- a/drivers/md/linear.c
>>>>> +++ b/drivers/md/linear.c
>>>>> @@ -28,10 +28,12 @@
>>>>>  static inline dev_info_t *which_dev(mddev_t *mddev, sector_t sector)
>>>>>  {
>>>>>         int lo, mid, hi;
>>>>> -       linear_conf_t *conf = mddev_to_conf(mddev);
>>>>> +       linear_conf_t *conf;
>>>>>
>>>>>         lo = 0;
>>>>>         hi = mddev->raid_disks - 1;
>>>>> +       barrier();
>>>>> +       conf = mddev_to_conf(mddev);
>>>>
>>>> Are you sure that a compiler barrier is sufficient here? Maybe we
>>>> need smp_mb(), but I'm not familar with all the different memory
>>>> barrier types that exist in the kernel.
>>>
>>> I think we just want a read barrier here:
>>>    smb_rmb();
>>> because it is a barrier between reading mddev->raid_disk and
>>> reading mddev->private.
>>>
>>> In linear_add, we want a write barrier
>>>    smb_wmb();
>>> before setting mdev->raid_disks, and I think we also want one
>>> before setting mddev->private to make sure that all the assignments
>>> to newconf->whatever are stable.
>>
>> Do we really need one before setting mddev->private? Will newconf be
>> used by anyone before its assigned to mddev->private?
>> No. Then what makes you put a write barrier before it?
>
> I'm not 100% certain we do need it, but the logic goes like that:
>
> One processor does
>  newconf->disks[0].end_sector = X;
>  mddev->private = newconf;
>
> and another processor does
>   conf = mddev->private;
>   foo = conf->disk[0].end_sector;
>
> Due to the lack of memory ordering guarantees, foo could end up
> with the value of end_sector *before* X was assigned to it.
>

Not sure but can be a possibility. But I feel if thats the case, we
might have to revisit the whole md code. I see similar instances at
lot many places.


> In fact, rcu_assign_pointer (in include/linux/rcupdate.h) has
> exactly this purpose: place an smp_wmb before assigning the address
> of a structure to a pointer, to ensure that the content of the
> structure is globally visible before the address of the structure
> become visible.
> So I would probably do
>
>   rcu_assign_pointer(mddev->private, newconf);
>   smb_wmb();
>   mddev->raid_disks ++;
>
> An alternative (which actually might be an improvement, I'm not
> sure) is to have a conf->raid_disks field and use that value in
> which_dev.  Then it would be more like
>
>   mddev->raid_disk++;
>   newconf->raid_disk = mddev->raid_disks;
>   rcu_assign_pointer(mddev->private, newconf);
>
> and we only have one memory barrier.
> The which_dev code then becomes:
>
>   conf = rcu_dereference(mddev->private);
>   hi = conf->raid_disks = 1;
>   .....
>

I like this approach more. And this will surely keep consistency
throughout the code.

> and again we have only one memory barrier.  The significant benefit
> of this is that instead of using the low level smp_[rw]mb() operations,
> we use the higher level rcu_* interfaces which make it a lot more
> obvious what we are doing.
>
> It would then be a very small step use use rcu_dereference for
> all mddev->private accesses in linear.c, wrap them in rcu_read_{,un}lock,
> and then use call_rcu to arrange for the old conf_t to be freed.
>

I am going to try this.

Will keep you posted.


> NeilBrown
>
>



-- 
Regards,
Sandeep.





 	
“To learn is to change. Education is a process that changes the learner.”
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux