On Fri, Mar 22, 2019 at 9:36 AM Ming Lei <tom.leiming@xxxxxxxxx> wrote: > > On Fri, Mar 22, 2019 at 2:39 AM Bart Van Assche <bvanassche@xxxxxxx> wrote: > > > > On Sat, 2019-03-16 at 10:09 +0800, Jason Yan wrote: > > > If we remove the scsi disk when running io with fio, oops occured with > > > the following condition. > > > > > > [scsi_eh_0] [fio] > > > scsi_end_request > > > ->blk_update_request > > > ->end_bio(io returned to userspace) > > > close > > > ->sd_release > > > ->scsi_disk_put > > > ->scsi_disk_release > > > ->disk->private_data = NULL; > > > > > > ->scsi_mq_uninit_cmd > > > ->scsi_uninit_cmd > > > ->scsi_cmd_to_driver > > > ->drv is NULL, Oops > > > > > > There is a small window between blk_update_request() and > > > scsi_mq_uninit_cmd() that scsi disk may have been released. This will > > > cause a oops like below: > > > > > > Unable to handle kernel NULL pointer dereference at virtual address > > > 0000000000000000 > > > s/sync.c:67, func=xfer, error=In[11347.116050] Mem abort info: > > > put/output error > > > [11347.121598] ESR = 0x96000006 > > > [11347.126200] Exception class = DABT (current EL), IL = 32 bits > > > [11347.132117] SET = 0, FnV = 0 > > > [11347.135170] EA = 0, S1PTW = 0 > > > [11347.138308] Data abort info: > > > [11347.141186] ISV = 0, ISS = 0x00000006 > > > [11347.145019] CM = 0, WnR = 0 > > > [11347.147977] user pgtable: 4k pages, 48-bit VAs, pgdp = > > > 00000000a67aece2 > > > [11347.154591] [0000000000000000] pgd=0000002f90774003, > > > pud=0000002fab098003, pmd=0000000000000000 > > > [11347.163304] Internal error: Oops: 96000006 [#1] PREEMPT SMP > > > [11347.168870] Modules linked in: hisi_sas_v3_hw hisi_sas_main libsas > > > [11347.175044] CPU: 56 PID: 4294 Comm: scsi_eh_2 Not tainted > > > 4.19.0-g8052059-dirty #2 > > > [11347.182600] Hardware name: Huawei D06/D06, BIOS Hisilicon D06 UEFI > > > RC0 - B601 (V6.01) 11/08/2018 > > > [11347.191370] pstate: a0c00009 (NzCv daif 㰃繐ε흾㯗 > > > > Please verify whether the following patch is a valid alternative for your patch: > > > > diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c > > index ed34bfbc3844..745ffdda1bc1 100644 > > --- a/drivers/scsi/sd.c > > +++ b/drivers/scsi/sd.c > > @@ -1408,6 +1408,7 @@ static void sd_release(struct gendisk *disk, fmode_t mode) > > { > > struct scsi_disk *sdkp = scsi_disk(disk); > > struct scsi_device *sdev = sdkp->device; > > + struct request_queue *q = sdkp->disk->queue; > > > > SCSI_LOG_HLQUEUE(3, sd_printk(KERN_INFO, sdkp, "sd_release\n")); > > > > @@ -1417,9 +1418,12 @@ static void sd_release(struct gendisk *disk, fmode_t mode) > > } > > > > /* > > - * XXX and what if there are packets in flight and this close() > > - * XXX is followed by a "rmmod sd_mod"? > > + * Wait until any requests that are in progress have completed. > > + * This is necessary to avoid that e.g. scsi_end_request() crashes > > + * due to scsi_disk_relase() clearing the disk->private_data pointer. > > */ > > + blk_mq_freeze_queue(q); > > + blk_mq_unfreeze_queue(q); > > It is over-kill to drain any requests here, what we want is to just > drain any in-flight > IO requests. Not only over-kill, actually it can cause big performance issue, since any block/scsi utility may trigger the freeze/unfreeze. Thanks, Ming Lei