On Thu, May 28, 2020 at 08:55:39PM +0200, Jan Kara wrote: > On Thu 28-05-20 18:43:33, Luis Chamberlain wrote: > > On Thu, May 28, 2020 at 08:31:52PM +0200, Jan Kara wrote: > > > On Thu 28-05-20 07:44:38, Bart Van Assche wrote: > > > > (+Luis) > > > > > > > > On 2020-05-28 02:29, Jan Kara wrote: > > > > > Mostly for historical reasons, q->blk_trace is assigned through xchg() > > > > > and cmpxchg() atomic operations. Although this is correct, sparse > > > > > complains about this because it violates rcu annotations. Furthermore > > > > > there's no real need for atomic operations anymore since all changes to > > > > > q->blk_trace happen under q->blk_trace_mutex. So let's just replace > > > > > xchg() with rcu_replace_pointer() and cmpxchg() with explicit check and > > > > > rcu_assign_pointer(). This makes the code more efficient and sparse > > > > > happy. > > > > > > > > > > Reported-by: kbuild test robot <lkp@xxxxxxxxx> > > > > > Signed-off-by: Jan Kara <jack@xxxxxxx> > > > > > > > > How about adding a reference to commit c780e86dd48e ("blktrace: Protect > > > > q->blk_trace with RCU") in the description of this patch? > > > > > > Yes, that's probably a good idea. > > > > > > > > @@ -1669,10 +1672,7 @@ static int blk_trace_setup_queue(struct request_queue *q, > > > > > > > > > > blk_trace_setup_lba(bt, bdev); > > > > > > > > > > - ret = -EBUSY; > > > > > - if (cmpxchg(&q->blk_trace, NULL, bt)) > > > > > - goto free_bt; > > > > > - > > > > > + rcu_assign_pointer(q->blk_trace, bt); > > > > > get_probe_ref(); > > > > > return 0; > > > > > > > > This changes a conditional assignment of q->blk_trace into an > > > > unconditional assignment. Shouldn't q->blk_trace only be assigned if > > > > q->blk_trace == NULL? > > > > > > Yes but both callers of blk_trace_setup_queue() actually check that > > > q->blk_trace is NULL before calling blk_trace_setup_queue() and since we > > > hold blk_trace_mutex all the time, the value of q->blk_trace cannot change. > > > So the conditional assignment was just bogus. > > > > If you run a blktrace against a different partition the check does have > > an effect today. This is because the request_queue is shared between > > partitions implicitly, even though they end up using a different struct > > dentry. So the check is actually still needed, however my change adds > > this check early as well so we don't do a memory allocation just to > > throw it away. > > I'm not sure we are speaking about the same check but I might be missing > something. blk_trace_setup_queue() is only called from > sysfs_blk_trace_attr_store(). That does: > > mutex_lock(&q->blk_trace_mutex); > > bt = rcu_dereference_protected(q->blk_trace, > lockdep_is_held(&q->blk_trace_mutex)); > if (attr == &dev_attr_enable) { > if (!!value == !!bt) { > ret = 0; > goto out_unlock_bdev; > } > > ^^^ So if 'bt' is non-NULL, and we are enabling, we bail > instead of calling blk_trace_setup_queue(). > > Similarly later: > > if (bt == NULL) { > ret = blk_trace_setup_queue(q, bdev); > ... > so we again call blk_trace_setup_queue() only if bt is NULL. So IMO the > cmpxchg() in blk_trace_setup_queue() could never fail to set the value. > Am I missing something? I believe we are talking about the same check indeed. Consider the situation not as a race, but instead consider the state machine of the ioctl. The BLKTRACESETUP goes first, and when that is over we have not ran BLKTRACESTART. So, prior to BLKTRACESTART we can have another BLKTRACESETUP run but against another partition. At that point we have two cases trying to use the same request_queue and the same q->blk_trace, even though this was well protected with the mutex. And so the final check is needed to ensure we only give one of the users the right to blktrace. Did I misunderstand the check though? Luis