Re: [PATCH] bcache: lock in btree_flush_write() to avoid races

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



From: Tang Junhui <tang.junhui@xxxxxxxxxx>

Hello Kent

>The only purpose of rcu_read_lock() would be to ensure the object
>isn't freed out from under you. That's not an issue here.
> 
I do not think so. In for_each_cached_btree(), we traverse all
btrees by hlist_for_each_entry_rcu(), and the hlist_for_each_entry_rcu()
should be protect by rcu lock, as the API describle bellow:
 * This list-traversal primitive may safely run concurrently with
 * the _rcu list-mutation primitives such as hlist_add_head_rcu()
 * as long as the traversal is guarded by rcu_read_lock().
 */
#define hlist_for_each_entry_rcu(pos, head, member)			\
	for (pos = hlist_entry_safe (rcu_dereference_raw(hlist_first_rcu(head)),\

>Racing with other writers isn't a correctness issue here, and if it
>was - just throwing write_lock around where you have it wouldn't be
>sufficient, you'd need something much more complicated.
> 
since we only read by hlist_for_each_entry_rcu(), so I think we does not 
need any more write locker.

>On Wed, Jan 24, 2018 at 8:50 PM,  <tang.junhui@xxxxxxxxxx> wrote:
>> From: Tang Junhui <tang.junhui@xxxxxxxxxx>
>>
>> Hello Kent && Nix
>>
>>>
>>>neither of those locks are needed - rcu_read_lock() isn't needed because we never
>>>free struct btree (except at shutdown), and we're not derefing journal there
>>
>> __bch_btree_node_write() is called in many places, in __bch_btree_node_write(),
>> before node writing over, it has changed the write buff index to another, but the
>> journal of changed write buff maybe still NULL if no journal writing occurred. But
>> luckily we only use the NULL as ZERO to calculate the fifo_idx, the result is a very
>> big value, so it does no harm. And since we cannot take mutex under rcu_read_lock,
>> we can ignore it.
>>
>> As to the rcu lock, though the btree is not freed, but we often delete it from one
>> list (such as one bucket_hash) and move it to another list (such as btree_cache_freed),
>> shouldn't it be protected by rcu locker?
>>
>>
>>>On Wed, Jan 24, 2018 at 5:30 AM, Nikolay Borisov <nborisov@xxxxxxxx> wrote:
>>>
>>>
>>>On 24.01.2018 08:54, tang.junhui@xxxxxxxxxx wrote:
>>>> From: Tang Junhui <tang.junhui@xxxxxxxxxx>
>>>>
>>>> In btree_flush_write(), two places need to take a locker to
>>>> avoid races:
>>>>
>>>> Firstly, we need take rcu read locker to protect the bucket_hash
>>>> traverse, since hlist_for_each_entry_rcu() must be called under
>>>> the protection of rcu read locker.
>>>>
>>>> Secondly, we need take b->write_lock locker to protect journal
>>>> of the btree node, otherwise, the btree node may have been
>>>> written, and the journal have been assign to NULL.
>>>>
>>>> Signed-off-by: Tang Junhui <tang.junhui@xxxxxxxxxx>
>>>> ---
>>>>  drivers/md/bcache/journal.c | 7 ++++++-
>>>>  1 file changed, 6 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/md/bcache/journal.c b/drivers/md/bcache/journal.c
>>>> index 02a98dd.505f9f3 100644
>>>> --- a/drivers/md/bcache/journal.c
>>>> +++ b/drivers/md/bcache/journal.c
>>>> @@ -375,7 +375,9 @@ static void btree_flush_write(struct cache_set *c)
>>>>  retry:
>>>>       best = NULL;
>>>>
>>>> -     for_each_cached_btree(b, c, i)
>>>> +     rcu_read_lock();
>>>> +     for_each_cached_btree(b, c, i) {
>>>> +             mutex_lock(&b->write_lock);
>>>
>>
>>>You can't sleep in rcu read critical section, yet here you take mutex
>>>which can sleep under rcu_read_lock.
>> To Nix: Yes, you are write. Good catch.
>>
>>>
>>>>               if (btree_current_write(b)->journal) {
>>>>                       if (!best)
>>>>                               best = b;
>>>> @@ -385,6 +387,9 @@ static void btree_flush_write(struct cache_set *c)
>>>>                               best = b;
>>>>                       }
>>>>               }
>>>> +             mutex_unlock(&b->write_lock);
>>>> +     }
>>>> +     rcu_read_unlock();
>>>>
>>>>       b = best;
>>>>       if (b) {
>>
>> Thanks,
>> Tang Junhui
>>

Thanks,
Tang Junhui




[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux