The only purpose of rcu_read_lock() would be to ensure the object isn't freed out from under you. That's not an issue here. Racing with other writers isn't a correctness issue here, and if it was - just throwing write_lock around where you have it wouldn't be sufficient, you'd need something much more complicated. On Wed, Jan 24, 2018 at 8:50 PM, <tang.junhui@xxxxxxxxxx> wrote: > From: Tang Junhui <tang.junhui@xxxxxxxxxx> > > Hello Kent && Nix > >> >>neither of those locks are needed - rcu_read_lock() isn't needed because we never >>free struct btree (except at shutdown), and we're not derefing journal there > > __bch_btree_node_write() is called in many places, in __bch_btree_node_write(), > before node writing over, it has changed the write buff index to another, but the > journal of changed write buff maybe still NULL if no journal writing occurred. But > luckily we only use the NULL as ZERO to calculate the fifo_idx, the result is a very > big value, so it does no harm. And since we cannot take mutex under rcu_read_lock, > we can ignore it. > > As to the rcu lock, though the btree is not freed, but we often delete it from one > list (such as one bucket_hash) and move it to another list (such as btree_cache_freed), > shouldn't it be protected by rcu locker? > > >>On Wed, Jan 24, 2018 at 5:30 AM, Nikolay Borisov <nborisov@xxxxxxxx> wrote: >> >> >>On 24.01.2018 08:54, tang.junhui@xxxxxxxxxx wrote: >>> From: Tang Junhui <tang.junhui@xxxxxxxxxx> >>> >>> In btree_flush_write(), two places need to take a locker to >>> avoid races: >>> >>> Firstly, we need take rcu read locker to protect the bucket_hash >>> traverse, since hlist_for_each_entry_rcu() must be called under >>> the protection of rcu read locker. >>> >>> Secondly, we need take b->write_lock locker to protect journal >>> of the btree node, otherwise, the btree node may have been >>> written, and the journal have been assign to NULL. >>> >>> Signed-off-by: Tang Junhui <tang.junhui@xxxxxxxxxx> >>> --- >>> drivers/md/bcache/journal.c | 7 ++++++- >>> 1 file changed, 6 insertions(+), 1 deletion(-) >>> >>> diff --git a/drivers/md/bcache/journal.c b/drivers/md/bcache/journal.c >>> index 02a98dd.505f9f3 100644 >>> --- a/drivers/md/bcache/journal.c >>> +++ b/drivers/md/bcache/journal.c >>> @@ -375,7 +375,9 @@ static void btree_flush_write(struct cache_set *c) >>> retry: >>> best = NULL; >>> >>> - for_each_cached_btree(b, c, i) >>> + rcu_read_lock(); >>> + for_each_cached_btree(b, c, i) { >>> + mutex_lock(&b->write_lock); >> > >>You can't sleep in rcu read critical section, yet here you take mutex >>which can sleep under rcu_read_lock. > To Nix: Yes, you are write. Good catch. > >> >>> if (btree_current_write(b)->journal) { >>> if (!best) >>> best = b; >>> @@ -385,6 +387,9 @@ static void btree_flush_write(struct cache_set *c) >>> best = b; >>> } >>> } >>> + mutex_unlock(&b->write_lock); >>> + } >>> + rcu_read_unlock(); >>> >>> b = best; >>> if (b) { > > Thanks, > Tang Junhui >