On Thu, Jun 02, 2022 at 01:26:10PM -0400, Waiman Long wrote: > > On 6/2/22 12:58, Tejun Heo wrote: > > Hello, > > > > On Thu, Jun 02, 2022 at 09:35:43AM -0400, Waiman Long wrote: > > > @@ -2011,9 +2076,16 @@ void blk_cgroup_bio_start(struct bio *bio) > > > } > > > bis->cur.ios[rwd]++; > > > + if (!READ_ONCE(bis->lnode.next)) { > > > + struct llist_head *lhead = per_cpu_ptr(blkcg->lhead, cpu); > > > + > > > + llist_add(&bis->lnode, lhead); > > > + percpu_ref_get(&bis->blkg->refcnt); > > Hmm... what guarantees that more than one threads race here? llist assumes > > that there's a single writer for a given llist_node and the ref count would > > be off too, right? > > The llist_add() function is atomic. It calls into llist_add_batch() in > lib/llist.c which uses cmpxchg() to make the change. There is a non-atomic > version __llist_add() which may be problematic in this case. Note that irq > is disabled in the u64_stats_update* critical section, there shouldn't be a > racing thread running in the same cpu. Other cpus will modify their own > version of lhead. Perhaps the non-atomic version can be used here as well. Ah, right, this is per-cpu, so there can be no second writer trying to add the same node at the same time. Can you add a comment explaining the overall design / behavior? Other than that, please feel free to add Acked-by: Tejun Heo <tj@xxxxxxxxxx> Thanks. -- tejun