On 12/5/18 10:55 AM, Guenter Roeck wrote: > On Tue, Dec 04, 2018 at 07:25:05PM -0700, Jens Axboe wrote: >> On 12/4/18 6:38 PM, Guenter Roeck wrote: >>> On Tue, Dec 04, 2018 at 03:47:46PM -0700, Jens Axboe wrote: >>>> If we attempt a direct issue to a SCSI device, and it returns BUSY, then >>>> we queue the request up normally. However, the SCSI layer may have >>>> already setup SG tables etc for this particular command. If we later >>>> merge with this request, then the old tables are no longer valid. Once >>>> we issue the IO, we only read/write the original part of the request, >>>> not the new state of it. >>>> >>>> This causes data corruption, and is most often noticed with the file >>>> system complaining about the just read data being invalid: >>>> >>>> [ 235.934465] EXT4-fs error (device sda1): ext4_iget:4831: inode #7142: comm dpkg-query: bad extra_isize 24937 (inode size 256) >>>> >>>> because most of it is garbage... >>>> >>>> This doesn't happen from the normal issue path, as we will simply defer >>>> the request to the hardware queue dispatch list if we fail. Once it's on >>>> the dispatch list, we never merge with it. >>>> >>>> Fix this from the direct issue path by flagging the request as >>>> REQ_NOMERGE so we don't change the size of it before issue. >>>> >>>> See also: >>>> https://bugzilla.kernel.org/show_bug.cgi?id=201685 >>>> >>>> Fixes: 6ce3dd6eec1 ("blk-mq: issue directly if hw queue isn't busy in case of 'none'") >>>> Signed-off-by: Jens Axboe <axboe@xxxxxxxxx> >>> >>> Tested-by: Guenter Roeck <linux@xxxxxxxxxxxx> >>> >>> ... on two systems affected by the problem. >> >> Thanks for testing! And for being persistent in reproducing and >> providing clues for getting this nailed. >> > > My pleasure. > > I see that there is some discussion about this patch. > > Unfortunately, everyone running a 4.19 or later kernel is at serious > risk of data corruption. Given that, if this patch doesn't make it > upstream for one reason or another, would it be possible to at least > revert the two patches introducing the problem until this is sorted > out for good ? If this is not acceptable either, maybe mark blk-mq > as broken ? After all, it _is_ broken. This is even more true if it > turns out that a problem may exist since 4.1, as suggested in the > discussion. It is queued up, it'll go upstream later today. > Also, it seems to me that even with this problem fixed, blk-mq may not > be ready for primetime after all. With that in mind, maybe commit > d5038a13eca72 ("scsi: core: switch to scsi-mq by default") was a > bit premature. Should that be reverted ? I have to strongly disagree with that, the timing is just unfortunate. There are literally millions of machines running blk-mq/scsi-mq, and this is the only hickup we've had. So I want to put this one to rest once and for all, there's absolutely no reason not to continue with what we've planned. -- Jens Axboe