> -----Original Message----- > From: Zheng Hacker <hackerzheng666@xxxxxxxxx> > Sent: Thursday, March 23, 2023 9:15 AM > To: Mike Christie <michael.christie@xxxxxxxxxx> > Cc: Zheng Wang <zyytlz.wz@xxxxxxx>; Nilesh Javali <njavali@xxxxxxxxxxx>; > Manish Rangankar <mrangankar@xxxxxxxxxxx>; GR-QLogic-Storage- > Upstream <GR-QLogic-Storage-Upstream@xxxxxxxxxxx>; > jejb@xxxxxxxxxxxxx; martin.petersen@xxxxxxxxxx; linux- > scsi@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; > 1395428693sheep@xxxxxxxxx; alex000young@xxxxxxxxx > Subject: [EXT] Re: [PATCH RESEND] scsi: qedi: Fix use after free bug in > qedi_remove due to race condition > > External Email > > ---------------------------------------------------------------------- > Mike Christie <michael.christie@xxxxxxxxxx> 于2023年3月21日周二 00:11写 > 道: > > > > On 3/18/23 3:13 AM, Zheng Wang wrote: > > > In qedi_probe, it calls __qedi_probe, which bound > > > &qedi->recovery_work with qedi_recovery_handler and bound > > > &qedi->board_disable_work with qedi_board_disable_work. > > > > > > When it calls qedi_schedule_recovery_handler, it will finally call > > > schedule_delayed_work to start the work. > > > > > > When we call qedi_remove to remove the driver, there may be a > > > sequence as follows: > > > > > > Fix it by finishing the work before cleanup in qedi_remove. > > > > > > CPU0 CPU1 > > > > > > |qedi_recovery_handler > > > qedi_remove | > > > __qedi_remove | > > > iscsi_host_free | > > > scsi_host_put | > > > //free shost | > > > |iscsi_host_for_each_session > > > |//use qedi->shost > > > > > > Fixes: 4b1068f5d74b ("scsi: qedi: Add MFW error recovery process") > > > Signed-off-by: Zheng Wang <zyytlz.wz@xxxxxxx> > > > --- > > > drivers/scsi/qedi/qedi_main.c | 4 ++++ > > > 1 file changed, 4 insertions(+) > > > > > > diff --git a/drivers/scsi/qedi/qedi_main.c > > > b/drivers/scsi/qedi/qedi_main.c index f2ee49756df8..25223f6f5344 > > > 100644 > > > --- a/drivers/scsi/qedi/qedi_main.c > > > +++ b/drivers/scsi/qedi/qedi_main.c > > > @@ -2414,6 +2414,10 @@ static void __qedi_remove(struct pci_dev > *pdev, int mode) > > > int rval; > > > u16 retry = 10; > > > > > > + /*cancel work*/ > > > > This comment is not needed. The name of the functions you are calling > > have "cancel" and "work" in them so we know. If you want to add a > > comment explain why the cancel calls are needed here. > > > > Hi, > > Sorry for my late reply and thanks for your advice. Will remove it in the next > version of patch. > > > > > > + cancel_delayed_work_sync(&qedi->recovery_work); > > > + cancel_delayed_work_sync(&qedi->board_disable_work); > > > > > > How do you know after you have called cancel_delayed_work_sync that > > schedule_recovery_handler or schedule_hw_err_handler can't be called? > > I don't know the qed driver well, but it looks like you could have > > operations still running, so after you cancel here one of those ops > > could lead to them scheduling the work again. > > > > Sorry I didn't know how to make sure there's no more schedule. But I do > think this is important. Maybe there're someone else who can give us advice. > > Best regards, > Zheng > > Best place to call cancel_delayed_work_sync is after qedi_ops->stop(qedi->cdev) and qedi_ops->ll2->stop(qedi->cdev);, after these qed calls firmware will not post any events to qedi driver. Thanks, Manish