Re: [PATCH] blk-mq: fix corruption with direct issue

Guenter Roeck <linux@xxxxxxxxxxxx> · Wed, 5 Dec 2018 09:55:54 -0800

On Tue, Dec 04, 2018 at 07:25:05PM -0700, Jens Axboe wrote:
> On 12/4/18 6:38 PM, Guenter Roeck wrote:
> > On Tue, Dec 04, 2018 at 03:47:46PM -0700, Jens Axboe wrote:
> >> If we attempt a direct issue to a SCSI device, and it returns BUSY, then
> >> we queue the request up normally. However, the SCSI layer may have
> >> already setup SG tables etc for this particular command. If we later
> >> merge with this request, then the old tables are no longer valid. Once
> >> we issue the IO, we only read/write the original part of the request,
> >> not the new state of it.
> >>
> >> This causes data corruption, and is most often noticed with the file
> >> system complaining about the just read data being invalid:
> >>
> >> [  235.934465] EXT4-fs error (device sda1): ext4_iget:4831: inode #7142: comm dpkg-query: bad extra_isize 24937 (inode size 256)
> >>
> >> because most of it is garbage...
> >>
> >> This doesn't happen from the normal issue path, as we will simply defer
> >> the request to the hardware queue dispatch list if we fail. Once it's on
> >> the dispatch list, we never merge with it.
> >>
> >> Fix this from the direct issue path by flagging the request as
> >> REQ_NOMERGE so we don't change the size of it before issue.
> >>
> >> See also:
> >>   https://bugzilla.kernel.org/show_bug.cgi?id=201685
> >>
> >> Fixes: 6ce3dd6eec1 ("blk-mq: issue directly if hw queue isn't busy in case of 'none'")
> >> Signed-off-by: Jens Axboe <axboe@xxxxxxxxx>
> > 
> > Tested-by: Guenter Roeck <linux@xxxxxxxxxxxx>
> > 
> > ... on two systems affected by the problem.
> 
> Thanks for testing! And for being persistent in reproducing and
> providing clues for getting this nailed.
> 

My pleasure.

I see that there is some discussion about this patch.

Unfortunately, everyone running a 4.19 or later kernel is at serious
risk of data corruption. Given that, if this patch doesn't make it
upstream for one reason or another, would it be possible to at least
revert the two patches introducing the problem until this is sorted
out for good ? If this is not acceptable either, maybe mark blk-mq
as broken ? After all, it _is_ broken. This is even more true if it
turns out that a problem may exist since 4.1, as suggested in the
discussion.

Also, it seems to me that even with this problem fixed, blk-mq may not
be ready for primetime after all. With that in mind, maybe commit
d5038a13eca72 ("scsi: core: switch to scsi-mq by default") was a
bit premature. Should that be reverted ?

Thanks,
Guenter