On 2019-01-16 6:56 p.m., Bart Van Assche wrote:
On Wed, 2019-01-16 at 10:57 -0500, Douglas Gilbert wrote:
The block layer assumes scsi_request:sense is always a valid
pointer. This is set up once in scsi_mq_init_request() and the
containing scsi_cmnd object is used often, being re-initialized
by scsi_init_command(). That works unless some code re-purposes
part of the scsi_cmnd object for something else. And that is
what bidi handling does in scsi_mq_prep_fn(). The result is an
oops at some later time when the partly overwritten object is
re-used. The overwrite is from d285203cf647d but 'git blame'
does not show removed code, so that commit may not be the
culprit.
Signed-off-by: Douglas Gilbert <dgilbert@xxxxxxxxxxxx>
---
This was found while injecting errors (thus generating sense data)
into a sequence of bidi commands. At some later time the block
layer blew up with a scsi_request::sense NULL dereference in
sg_rq_end_io(). Without testing I'm confident the bsg driver,
the osd ULD and exofs are exposed to this bug.
drivers/scsi/scsi_lib.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index b13cc9288ba0..71259bd4040a 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -1175,6 +1175,7 @@ void scsi_init_command(struct scsi_device *dev, struct scsi_cmnd *cmd)
cmd->device = dev;
cmd->sense_buffer = buf;
+ cmd->req.sense = buf;
cmd->prot_sdb = prot;
cmd->flags = flags;
INIT_DELAYED_WORK(&cmd->abort_work, scmd_eh_abort_handler);
Hi Doug,
The description of this patch does not look correct to me. scsi_init_command()
does not overwrite the sense pointer. From the body of that function:
/* zero out the cmd, except for the embedded scsi_request */
memset((char *)cmd + sizeof(cmd->req), 0,
sizeof(*cmd) - sizeof(cmd->req) + dev->host->hostt->cmd_size);
It is not clear to me which code overwrites the sense pointer. I think that
needs to be figured out before discussion of this patch can continue.
Bart,
Please re-read the explanation. The two sentences in the middle should tell
you that you are looking at the wrong memset().
And I'm waiting for the person who wrote the questionable code to comment.
If you don't believe me, try sending a device reset to a scsi_debug device.
Then use sg_raw *** to send a XDWRITREAD(10) bidi command to the same scsi_debug
device, you should get a Unit Attention concerning that device reset. If
you do, keep sending that bidi command. Make sure you have no important
files open because that machine will lock solid. Basically what happens
when a rq_end_io() callback de-references a NULL pointer. It looks like
it has been there since 2014 and took me 2 days to find. Sorry that I can't
explain it in one simple sentence.
Douglas
*** Use 'man sg_raw' and see the EXAMPLES section (second to last example)
You can use a bsg device as shown.