Hello Arun, On Wed, 2020-04-01 at 10:12 -0700, Arun Easi wrote: > On Fri, 27 Mar 2020, 9:47am, mwilck@xxxxxxxx wrote: > > > External Email > > > > ----------------------------------------------------------------- > > ----- > > From: Martin Wilck <mwilck@xxxxxxxx> > > > > This reverts commit c3b6a1d397420a0fdd97af2f06abfb78adc370df. > > Aborting the sleep was risky, because after return from > > qlt_free_session_done() the driver starts freeing resources, > > which is dangerous while we know that there's pending IO. > > > > The previous patch "scsi: qla2xxx: check UNLOADING before posting > > async > > work" avoids sending this IO in the first place, and thus obsoletes > > the dangerous timeout. > > > > Signed-off-by: Martin Wilck <mwilck@xxxxxxxx> > > --- > > drivers/scsi/qla2xxx/qla_target.c | 4 ---- > > 1 file changed, 4 deletions(-) > > > > diff --git a/drivers/scsi/qla2xxx/qla_target.c > > b/drivers/scsi/qla2xxx/qla_target.c > > index 622e733..eec1338 100644 > > --- a/drivers/scsi/qla2xxx/qla_target.c > > +++ b/drivers/scsi/qla2xxx/qla_target.c > > @@ -1019,7 +1019,6 @@ void qlt_free_session_done(struct work_struct > > *work) > > > > if (logout_started) { > > bool traced = false; > > - u16 cnt = 0; > > > > while (!READ_ONCE(sess->logout_completed)) { > > if (!traced) { > > @@ -1029,9 +1028,6 @@ void qlt_free_session_done(struct work_struct > > *work) > > traced = true; > > } > > msleep(100); > > - cnt++; > > - if (cnt > 200) > > - break; > > By taking this code out, it would leave a stuck FC target delete > thread > and thus preventing the module unload itself, in case of a bug in > this > logic (which was seen in some instances). > > How about increasing the wait time to say 25 seconds (typical worst > case > is 2 * RA_TOV = 20 seconds) and then alerting user with a "WARN", > but > still break out? > I've seen the kernel crash because of this breakout. A crash happens if another code path was blocked because of either the stopped FW or anything else, and would time out after this code. The actual crash I've seen is already been fixed by patch 1 of this series, because qla2x00_terminate_rport_io() checks UNLOADING before sending IO. So if it was just about that, we could keep the breakout. I suspect that the "unbound sleep" occurred only because of the hanging access to stopped FW (note that c3b6a1d39742 was made just ~2 months after 45235022da99 "scsi: qla2xxx: Fix driver unload by shutting down chip"). If that's true, the hang wouldn't occur any more with patch 1-2 of my series applied. If it's wrong and the hang does occur again, IMO we should find out what is hanging, and fix it, rather than blindly quit this wait loop and free resources. But if you want to keep it in, I'm not going to insist; I'm more interested in getting the first two patches merged. Regards, Martin (Btw, I suppose you'll admit the combination of msleep() and counting isn't the state of the art of implementing a timeout in the kernel). > Regards, > -Arun > > > } > > > > ql_dbg(ql_dbg_disc, vha, 0xf087, > >