On 1/17/23 14:03, Martin Wilck wrote:
On Tue, 2023-01-17 at 13:52 -0800, Bart Van Assche wrote:
On 1/17/23 13:48, Martin Wilck wrote:
Yes, that was my suggestion. Just defer the scsi_device_put() call
in
alua_rtpg_queue() in the case where the actual RTPG handler is not
queued. I won't have time for that before next week though.
Hi Martin,
Do you agree that the call trace shared by Steffen is not sufficient
to
conclude that this change is necessary?
Hmm, I suppose I missed your point... to re-iterate my thinking:
1 alua_queue_rtpg() must take a ref to the sdev before queueing work,
whether or not the caller already has one
2 queue_delayed_work() can fail
3 if queue_delayed_work() fails, alua_queue_rtpg() must drop the ref
it just took
4 BUT (and this is what I guess I missed) this ref can't be the last
one dropped, because the caller of alua_rtpg_queue() must still hold
a reference. And scsi_device_put() only sleeps if the last ref is
dropped. Therefore the issue in Steffen's call stack should
indeed be fixed just by removing the might_sleep(). If all callers
callers of alua_rtpg_queue() must hold an sdev reference (I believe
they do), we can indeed remove the might_sleep() entirely.
Is this correct reasoning, and what you meant previously? If yes, I
agree, and I apologize for not realizing it in the first place.
But I think this is subtle enough to deserve a comment in the code.
Yes, that's what I'm thinking.
How about the patch below?
Thanks,
Bart.
[PATCH] scsi: device_handler: alua: Remove a might_sleep() annotation
The might_sleep() annotation in alua_rtpg_queue() is not correct since the
command completion code may call this function from atomic context.
Calling alua_rtpg_queue() from atomic context in the command completion
path is fine since request submitters must hold an sdev reference until
command execution has completed. This patch fixes the following kernel
warning:
BUG: sleeping function called from invalid context at drivers/scsi/device_handler/scsi_dh_alua.c:992
Call Trace:
dump_stack_lvl+0xac/0x100
__might_resched+0x284/0x2c8
alua_rtpg_queue+0x3c/0x98 [scsi_dh_alua]
alua_check+0x122/0x250 [scsi_dh_alua]
alua_check_sense+0x172/0x228 [scsi_dh_alua]
scsi_check_sense+0x8a/0x2e0
scsi_decide_disposition+0x286/0x298
scsi_complete+0x6a/0x108
blk_complete_reqs+0x6e/0x88
__do_softirq+0x13e/0x6b8
__irq_exit_rcu+0x14a/0x170
irq_exit_rcu+0x22/0x50
do_ext_irq+0x10a/0x1d0
Reported-by: Steffen Maier <maier@xxxxxxxxxxxxx>
Cc: Steffen Maier <maier@xxxxxxxxxxxxx>
Cc: Martin Wilck <mwilck@xxxxxxxx>
Signed-off-by: Bart Van Assche <bvanassche@xxxxxxx>
---
drivers/scsi/device_handler/scsi_dh_alua.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/drivers/scsi/device_handler/scsi_dh_alua.c b/drivers/scsi/device_handler/scsi_dh_alua.c
index 55a5073248f8..362fa631f39b 100644
--- a/drivers/scsi/device_handler/scsi_dh_alua.c
+++ b/drivers/scsi/device_handler/scsi_dh_alua.c
@@ -987,6 +987,9 @@ static void alua_rtpg_work(struct work_struct *work)
*
* Returns true if and only if alua_rtpg_work() will be called asynchronously.
* That function is responsible for calling @qdata->fn().
+ *
+ * Context: may be called from atomic context (alua_check()) only if the caller
+ * holds an sdev reference.
*/
static bool alua_rtpg_queue(struct alua_port_group *pg,
struct scsi_device *sdev,
@@ -995,8 +998,6 @@ static bool alua_rtpg_queue(struct alua_port_group *pg,
int start_queue = 0;
unsigned long flags;
- might_sleep();
-
if (WARN_ON_ONCE(!pg) || scsi_device_get(sdev))
return false;