On 8/13/20 8:57 AM, Douglas Gilbert wrote: > John Garry reported 'sdebug_q_cmd_complete: scp is NULL' failures > that were mainly seen on aarch64 machines (e.g. RPi 4 with four > A72 CPUs). The problem was tracked down to a missing critical > section on a "short circuit" path. Namely, the time to process > the current command so far has already exceeded the requested > command duration (i.e. the number of nanoseconds in the ndelay > parameter). > > The random=1 parameter setting was pivotal in finding this error. > The failure scenario involved first taking that "short circuit" > path (due to a very short command duration) and then taking the > more likely hrtimer_start() path (due to a longer command > duration). With random=1 each command's duration is taken from > the uniformly distributed [0..ndelay) interval. > The fio utility also helped by reliably generating the error > scenario at about once per minute on a RPi 4 (64 bit OS). > > Reported-by: John Garry <john.garry@xxxxxxxxxx> > Signed-off-by: Douglas Gilbert <dgilbert@xxxxxxxxxxxx> > --- > drivers/scsi/scsi_debug.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/drivers/scsi/scsi_debug.c b/drivers/scsi/scsi_debug.c > index d95822dceeb6..4b4e31af22bd 100644 > --- a/drivers/scsi/scsi_debug.c > +++ b/drivers/scsi/scsi_debug.c > @@ -5471,9 +5471,11 @@ static int schedule_resp(struct scsi_cmnd *cmnd, struct sdebug_dev_info *devip, > u64 d = ktime_get_boottime_ns() - ns_from_boot; > > if (kt <= d) { /* elapsed duration >= kt */ > + spin_lock_irqsave(&sqp->qc_lock, iflags); > sqcp->a_cmnd = NULL; > atomic_dec(&devip->num_in_q); > clear_bit(k, sqp->in_use_bm); > + spin_unlock_irqrestore(&sqp->qc_lock, iflags); > if (new_sd_dp) > kfree(sd_dp); > /* call scsi_done() from this thread */ > Reviewed-by: Lee Duncan <lduncan@xxxxxxxx>