Re: [Bug 208827] [fio io_uring] io_uring write data crc32c verify failed

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 8/11/20 1:05 AM, Dave Chinner wrote:
> On Mon, Aug 10, 2020 at 08:19:57PM -0600, Jens Axboe wrote:
>> On 8/10/20 8:00 PM, Dave Chinner wrote:
>>> On Mon, Aug 10, 2020 at 07:08:59PM +1000, Dave Chinner wrote:
>>>> On Mon, Aug 10, 2020 at 05:08:07PM +1000, Dave Chinner wrote:
>>>>> [cc Jens]
>>>>>
>>>>> [Jens, data corruption w/ io_uring and simple fio reproducer. see
>>>>> the bz link below.]
>>>
>>> Looks like a io_uring/fio bugs at this point, Jens. All your go fast
>>> bits turns the buffered read into a short read, and neither fio nor
>>> io_uring async buffered read path handle short reads. Details below.
>>
>> It's a fio issue. The io_uring engine uses a different path for short
>> IO completions, and that's being ignored by the backend... Hence the
>> IO just gets completed and not retried for this case, and that'll then
>> trigger verification as if it did complete. I'm fixing it up.
> 
> I just updated fio to:
> 
> cb7d7abb (HEAD -> master, origin/master, origin/HEAD) io_u: set io_u->verify_offset in fill_io_u()
> 
> The workload still reports corruption almost instantly. Only this
> time, the trace is not reporting a short read.
> 
> File is patterned with:
> 
> verify_pattern=0x33333333%o-16
> 
> Offset of "bad" data is 0x1240000.
> 
> Expected:
> 
> 00000000:  33 33 33 33 00 10 24 01 00 00 00 00 f0 ff ff ff 3333............
> 00000010:  33 33 33 33 00 10 24 01 00 00 00 00 f0 ff ff ff 3333............
> 00000020:  33 33 33 33 00 10 24 01 00 00 00 00 f0 ff ff ff 3333............
> 00000030:  33 33 33 33 00 10 24 01 00 00 00 00 f0 ff ff ff 3333............
> 00000040:  33 33 33 33 00 10 24 01 00 00 00 00 f0 ff ff ff 3333............
> 00000050:  33 33 33 33 00 10 24 01 00 00 00 00 f0 ff ff ff 3333............
> 00000060:  33 33 33 33 00 10 24 01 00 00 00 00 f0 ff ff ff 3333............
> 00000070:  33 33 33 33 00 10 24 01 00 00 00 00 f0 ff ff ff 3333............
> 00000080:  33 33 33 33 00 10 24 01 00 00 00 00 f0 ff ff ff 3333............
> .....
> 0000ffd0:  33 33 33 33 00 10 24 01 00 00 00 00 f0 ff ff ff  3333............
> 0000ffe0:  33 33 33 33 00 10 24 01 00 00 00 00 f0 ff ff ff  3333............
> 0000fff0:  33 33 33 33 00 10 24 01 00 00 00 00 f0 ff ff ff  3333............
> 
> 
> Received:
> 
> 00000000:  33 33 33 33 00 00 24 01 00 00 00 00 f0 ff ff ff 3333............
> 00000010:  33 33 33 33 00 00 24 01 00 00 00 00 f0 ff ff ff 3333............
> 00000020:  33 33 33 33 00 00 24 01 00 00 00 00 f0 ff ff ff 3333............
> 00000030:  33 33 33 33 00 00 24 01 00 00 00 00 f0 ff ff ff 3333............
> 00000040:  33 33 33 33 00 00 24 01 00 00 00 00 f0 ff ff ff 3333............
> 00000050:  33 33 33 33 00 00 24 01 00 00 00 00 f0 ff ff ff 3333............
> 00000060:  33 33 33 33 00 00 24 01 00 00 00 00 f0 ff ff ff 3333............
> 00000070:  33 33 33 33 00 00 24 01 00 00 00 00 f0 ff ff ff 3333............
> 00000080:  33 33 33 33 00 00 24 01 00 00 00 00 f0 ff ff ff 3333............
> .....
> 0000ffd0:  33 33 33 33 00 00 24 01 00 00 00 00 f0 ff ff ff  3333............
> 0000ffe0:  33 33 33 33 00 00 24 01 00 00 00 00 f0 ff ff ff  3333............
> 0000fff0:  33 33 33 33 00 00 24 01 00 00 00 00 f0 ff ff ff  3333............
> 
> 
> Looks like the data in the expected buffer is wrong - the data
> pattern in the received buffer is correct according the defined
> pattern.
> 
> Error is 100% reproducable from the same test case. Same bad byte in
> the expected buffer dump every single time.

What job file are you running? It's not impossible that I broken
something else in fio, the io_u->verify_offset is a bit risky... I'll
get it fleshed out shortly.

-- 
Jens Axboe




[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux