Re: [PATCH] blk-mq: fix corruption with direct issue

Jens Axboe <axboe@xxxxxxxxx> · Wed, 5 Dec 2018 10:59:21 -0700

On 12/5/18 10:55 AM, Guenter Roeck wrote:
> On Tue, Dec 04, 2018 at 07:25:05PM -0700, Jens Axboe wrote:
>> On 12/4/18 6:38 PM, Guenter Roeck wrote:
>>> On Tue, Dec 04, 2018 at 03:47:46PM -0700, Jens Axboe wrote:
>>>> If we attempt a direct issue to a SCSI device, and it returns BUSY, then
>>>> we queue the request up normally. However, the SCSI layer may have
>>>> already setup SG tables etc for this particular command. If we later
>>>> merge with this request, then the old tables are no longer valid. Once
>>>> we issue the IO, we only read/write the original part of the request,
>>>> not the new state of it.
>>>>
>>>> This causes data corruption, and is most often noticed with the file
>>>> system complaining about the just read data being invalid:
>>>>
>>>> [  235.934465] EXT4-fs error (device sda1): ext4_iget:4831: inode #7142: comm dpkg-query: bad extra_isize 24937 (inode size 256)
>>>>
>>>> because most of it is garbage...
>>>>
>>>> This doesn't happen from the normal issue path, as we will simply defer
>>>> the request to the hardware queue dispatch list if we fail. Once it's on
>>>> the dispatch list, we never merge with it.
>>>>
>>>> Fix this from the direct issue path by flagging the request as
>>>> REQ_NOMERGE so we don't change the size of it before issue.
>>>>
>>>> See also:
>>>>   https://bugzilla.kernel.org/show_bug.cgi?id=201685
>>>>
>>>> Fixes: 6ce3dd6eec1 ("blk-mq: issue directly if hw queue isn't busy in case of 'none'")
>>>> Signed-off-by: Jens Axboe <axboe@xxxxxxxxx>
>>>
>>> Tested-by: Guenter Roeck <linux@xxxxxxxxxxxx>
>>>
>>> ... on two systems affected by the problem.
>>
>> Thanks for testing! And for being persistent in reproducing and
>> providing clues for getting this nailed.
>>
> 
> My pleasure.
> 
> I see that there is some discussion about this patch.
> 
> Unfortunately, everyone running a 4.19 or later kernel is at serious
> risk of data corruption. Given that, if this patch doesn't make it
> upstream for one reason or another, would it be possible to at least
> revert the two patches introducing the problem until this is sorted
> out for good ? If this is not acceptable either, maybe mark blk-mq
> as broken ? After all, it _is_ broken. This is even more true if it
> turns out that a problem may exist since 4.1, as suggested in the
> discussion.

It is queued up, it'll go upstream later today.

> Also, it seems to me that even with this problem fixed, blk-mq may not
> be ready for primetime after all. With that in mind, maybe commit
> d5038a13eca72 ("scsi: core: switch to scsi-mq by default") was a
> bit premature. Should that be reverted ?

I have to strongly disagree with that, the timing is just unfortunate.
There are literally millions of machines running blk-mq/scsi-mq, and
this is the only hickup we've had. So I want to put this one to rest
once and for all, there's absolutely no reason not to continue with
what we've planned.

-- 
Jens Axboe