On Tue, 2023-01-17 at 16:29 -0800, Bart Van Assche wrote: > On 1/17/23 14:03, Martin Wilck wrote: > > On Tue, 2023-01-17 at 13:52 -0800, Bart Van Assche wrote: > > > On 1/17/23 13:48, Martin Wilck wrote: > > > > Yes, that was my suggestion. Just defer the scsi_device_put() > > > > call > > > > in > > > > alua_rtpg_queue() in the case where the actual RTPG handler is > > > > not > > > > queued. I won't have time for that before next week though. > > > > > > Hi Martin, > > > > > > Do you agree that the call trace shared by Steffen is not > > > sufficient > > > to > > > conclude that this change is necessary? > > > > Hmm, I suppose I missed your point... to re-iterate my thinking: > > > > 1 alua_queue_rtpg() must take a ref to the sdev before queueing > > work, > > whether or not the caller already has one > > 2 queue_delayed_work() can fail > > 3 if queue_delayed_work() fails, alua_queue_rtpg() must drop the > > ref > > it just took > > 4 BUT (and this is what I guess I missed) this ref can't be the > > last > > one dropped, because the caller of alua_rtpg_queue() must still > > hold > > a reference. And scsi_device_put() only sleeps if the last ref > > is > > dropped. Therefore the issue in Steffen's call stack should > > indeed be fixed just by removing the might_sleep(). If all > > callers > > callers of alua_rtpg_queue() must hold an sdev reference (I > > believe > > they do), we can indeed remove the might_sleep() entirely. > > > > Is this correct reasoning, and what you meant previously? If yes, I > > agree, and I apologize for not realizing it in the first place. > > But I think this is subtle enough to deserve a comment in the code. > > Yes, that's what I'm thinking. > > How about the patch below? > > Thanks, > > Bart. > > [PATCH] scsi: device_handler: alua: Remove a might_sleep() annotation > > The might_sleep() annotation in alua_rtpg_queue() is not correct > since the > command completion code may call this function from atomic context. > Calling alua_rtpg_queue() from atomic context in the command > completion > path is fine since request submitters must hold an sdev reference > until > command execution has completed. This patch fixes the following > kernel > warning: > > BUG: sleeping function called from invalid context at > drivers/scsi/device_handler/scsi_dh_alua.c:992 > Call Trace: > dump_stack_lvl+0xac/0x100 > __might_resched+0x284/0x2c8 > alua_rtpg_queue+0x3c/0x98 [scsi_dh_alua] > alua_check+0x122/0x250 [scsi_dh_alua] > alua_check_sense+0x172/0x228 [scsi_dh_alua] > scsi_check_sense+0x8a/0x2e0 > scsi_decide_disposition+0x286/0x298 > scsi_complete+0x6a/0x108 > blk_complete_reqs+0x6e/0x88 > __do_softirq+0x13e/0x6b8 > __irq_exit_rcu+0x14a/0x170 > irq_exit_rcu+0x22/0x50 > do_ext_irq+0x10a/0x1d0 > > Reported-by: Steffen Maier <maier@xxxxxxxxxxxxx> > Cc: Steffen Maier <maier@xxxxxxxxxxxxx> > Cc: Martin Wilck <mwilck@xxxxxxxx> > Signed-off-by: Bart Van Assche <bvanassche@xxxxxxx> Reviewed-by: Martin Wilck <mwilck@xxxxxxxx> > --- > drivers/scsi/device_handler/scsi_dh_alua.c | 5 +++-- > 1 file changed, 3 insertions(+), 2 deletions(-) > > diff --git a/drivers/scsi/device_handler/scsi_dh_alua.c > b/drivers/scsi/device_handler/scsi_dh_alua.c > index 55a5073248f8..362fa631f39b 100644 > --- a/drivers/scsi/device_handler/scsi_dh_alua.c > +++ b/drivers/scsi/device_handler/scsi_dh_alua.c > @@ -987,6 +987,9 @@ static void alua_rtpg_work(struct work_struct > *work) > * > * Returns true if and only if alua_rtpg_work() will be called > asynchronously. > * That function is responsible for calling @qdata->fn(). > + * > + * Context: may be called from atomic context (alua_check()) only if > the caller > + * holds an sdev reference. > */ > static bool alua_rtpg_queue(struct alua_port_group *pg, > struct scsi_device *sdev, > @@ -995,8 +998,6 @@ static bool alua_rtpg_queue(struct > alua_port_group *pg, > int start_queue = 0; > unsigned long flags; > > - might_sleep(); > - > if (WARN_ON_ONCE(!pg) || scsi_device_get(sdev)) > return false; > >