On Thu, 2019-01-17 at 16:05 -0500, Douglas Gilbert wrote: +AD4 On 2019-01-16 8:06 p.m., Bart Van Assche wrote: +AD4 +AD4 On Wed, 2019-01-16 at 19:54 -0500, Douglas Gilbert wrote: +AD4 +AD4 +AD4 On 2019-01-16 6:56 p.m., Bart Van Assche wrote: +AD4 +AD4 +AD4 +AD4 On Wed, 2019-01-16 at 10:57 -0500, Douglas Gilbert wrote: +AD4 +AD4 +AD4 +AD4 +AD4 The block layer assumes scsi+AF8-request:sense is always a valid +AD4 +AD4 +AD4 +AD4 +AD4 pointer. This is set up once in scsi+AF8-mq+AF8-init+AF8-request() and the +AD4 +AD4 +AD4 +AD4 +AD4 containing scsi+AF8-cmnd object is used often, being re-initialized +AD4 +AD4 +AD4 +AD4 +AD4 by scsi+AF8-init+AF8-command(). That works unless some code re-purposes +AD4 +AD4 +AD4 +AD4 +AD4 part of the scsi+AF8-cmnd object for something else. And that is +AD4 +AD4 +AD4 +AD4 +AD4 what bidi handling does in scsi+AF8-mq+AF8-prep+AF8-fn(). The result is an +AD4 +AD4 +AD4 +AD4 +AD4 oops at some later time when the partly overwritten object is +AD4 +AD4 +AD4 +AD4 +AD4 re-used. The overwrite is from d285203cf647d but 'git blame' +AD4 +AD4 +AD4 +AD4 +AD4 does not show removed code, so that commit may not be the +AD4 +AD4 +AD4 +AD4 +AD4 culprit. +AD4 +AD4 +AD4 +AD4 +AD4 +AD4 +AD4 +AD4 +AD4 +AD4 Signed-off-by: Douglas Gilbert +ADw-dgilbert+AEA-interlog.com+AD4 +AD4 +AD4 +AD4 +AD4 +AD4 --- +AD4 +AD4 +AD4 +AD4 +AD4 +AD4 +AD4 +AD4 +AD4 +AD4 This was found while injecting errors (thus generating sense data) +AD4 +AD4 +AD4 +AD4 +AD4 into a sequence of bidi commands. At some later time the block +AD4 +AD4 +AD4 +AD4 +AD4 layer blew up with a scsi+AF8-request::sense NULL dereference in +AD4 +AD4 +AD4 +AD4 +AD4 sg+AF8-rq+AF8-end+AF8-io(). Without testing I'm confident the bsg driver, +AD4 +AD4 +AD4 +AD4 +AD4 the osd ULD and exofs are exposed to this bug. +AD4 +AD4 +AD4 +AD4 +AD4 +AD4 +AD4 +AD4 +AD4 +AD4 drivers/scsi/scsi+AF8-lib.c +AHw 1 +- +AD4 +AD4 +AD4 +AD4 +AD4 1 file changed, 1 insertion(+-) +AD4 +AD4 +AD4 +AD4 +AD4 +AD4 +AD4 +AD4 +AD4 +AD4 diff --git a/drivers/scsi/scsi+AF8-lib.c b/drivers/scsi/scsi+AF8-lib.c +AD4 +AD4 +AD4 +AD4 +AD4 index b13cc9288ba0..71259bd4040a 100644 +AD4 +AD4 +AD4 +AD4 +AD4 --- a/drivers/scsi/scsi+AF8-lib.c +AD4 +AD4 +AD4 +AD4 +AD4 +-+-+- b/drivers/scsi/scsi+AF8-lib.c +AD4 +AD4 +AD4 +AD4 +AD4 +AEAAQA -1175,6 +-1175,7 +AEAAQA void scsi+AF8-init+AF8-command(struct scsi+AF8-device +ACo-dev, struct scsi+AF8-cmnd +ACo-cmd) +AD4 +AD4 +AD4 +AD4 +AD4 +AD4 +AD4 +AD4 +AD4 +AD4 cmd-+AD4-device +AD0 dev+ADs +AD4 +AD4 +AD4 +AD4 +AD4 cmd-+AD4-sense+AF8-buffer +AD0 buf+ADs +AD4 +AD4 +AD4 +AD4 +AD4 +- cmd-+AD4-req.sense +AD0 buf+ADs +AD4 +AD4 +AD4 +AD4 +AD4 cmd-+AD4-prot+AF8-sdb +AD0 prot+ADs +AD4 +AD4 +AD4 +AD4 +AD4 cmd-+AD4-flags +AD0 flags+ADs +AD4 +AD4 +AD4 +AD4 +AD4 INIT+AF8-DELAYED+AF8-WORK(+ACY-cmd-+AD4-abort+AF8-work, scmd+AF8-eh+AF8-abort+AF8-handler)+ADs +AD4 +AD4 +AD4 +AD4 +AD4 +AD4 +AD4 +AD4 Hi Doug, +AD4 +AD4 +AD4 +AD4 +AD4 +AD4 +AD4 +AD4 The description of this patch does not look correct to me. scsi+AF8-init+AF8-command() +AD4 +AD4 +AD4 +AD4 does not overwrite the sense pointer. From the body of that function: +AD4 +AD4 +AD4 +AD4 +AD4 +AD4 +AD4 +AD4 /+ACo zero out the cmd, except for the embedded scsi+AF8-request +ACo-/ +AD4 +AD4 +AD4 +AD4 memset((char +ACo)cmd +- sizeof(cmd-+AD4-req), 0, +AD4 +AD4 +AD4 +AD4 sizeof(+ACo-cmd) - sizeof(cmd-+AD4-req) +- dev-+AD4-host-+AD4-hostt-+AD4-cmd+AF8-size)+ADs +AD4 +AD4 +AD4 +AD4 +AD4 +AD4 +AD4 +AD4 It is not clear to me which code overwrites the sense pointer. I think that +AD4 +AD4 +AD4 +AD4 needs to be figured out before discussion of this patch can continue. +AD4 +AD4 +AD4 +AD4 +AD4 +AD4 Bart, +AD4 +AD4 +AD4 Please re-read the explanation. The two sentences in the middle should tell +AD4 +AD4 +AD4 you that you are looking at the wrong memset(). +AD4 +AD4 +AD4 +AD4 +AD4 +AD4 And I'm waiting for the person who wrote the questionable code to comment. +AD4 +AD4 +AD4 +AD4 +AD4 +AD4 +AD4 +AD4 +AD4 If you don't believe me, try sending a device reset to a scsi+AF8-debug device. +AD4 +AD4 +AD4 Then use sg+AF8-raw +ACoAKgAq to send a XDWRITREAD(10) bidi command to the same scsi+AF8-debug +AD4 +AD4 +AD4 device, you should get a Unit Attention concerning that device reset. If +AD4 +AD4 +AD4 you do, keep sending that bidi command. Make sure you have no important +AD4 +AD4 +AD4 files open because that machine will lock solid. Basically what happens +AD4 +AD4 +AD4 when a rq+AF8-end+AF8-io() callback de-references a NULL pointer. It looks like +AD4 +AD4 +AD4 it has been there since 2014 and took me 2 days to find. Sorry that I can't +AD4 +AD4 +AD4 explain it in one simple sentence. +AD4 +AD4 +AD4 +AD4 Hi Doug, +AD4 +AD4 +AD4 +AD4 Is this perhaps the memset you are referring to? +AD4 +AD4 +AD4 +AD4 memset(bidi+AF8-sdb, 0, sizeof(struct scsi+AF8-data+AF8-buffer))+ADs +AD4 +AD4 Yes. Hi Doug, I think it's worse than what you explained in your patch description: some code interprets blk+AF8-mq+AF8-rq+AF8-to+AF8-pdu(cmd-+AD4-request-+AD4-next+AF8-rq) as a struct scsi+AF8-data+AF8-buffer, e.g. scsi+AF8-mq+AF8-prep+AF8-fn(). Other code in the SCSI core interprets the same data structure as a struct scsi+AF8-request, e.g. scsi+AF8-io+AF8-completion(). So I don't think that your patch is sufficient. I have already started working on a patch series that clears up this confusion. Bart.