[Bug 208827] [fio io_uring] io_uring write data crc32c verify failed

bugzilla-daemon@xxxxxxxxxxxxxxxxxxx · Tue, 11 Aug 2020 07:05:12 +0000

https://bugzilla.kernel.org/show_bug.cgi?id=208827

--- Comment #14 from Dave Chinner (david@xxxxxxxxxxxxx) ---
On Mon, Aug 10, 2020 at 08:19:57PM -0600, Jens Axboe wrote:
> On 8/10/20 8:00 PM, Dave Chinner wrote:
> > On Mon, Aug 10, 2020 at 07:08:59PM +1000, Dave Chinner wrote:
> >> On Mon, Aug 10, 2020 at 05:08:07PM +1000, Dave Chinner wrote:
> >>> [cc Jens]
> >>>
> >>> [Jens, data corruption w/ io_uring and simple fio reproducer. see
> >>> the bz link below.]
> > 
> > Looks like a io_uring/fio bugs at this point, Jens. All your go fast
> > bits turns the buffered read into a short read, and neither fio nor
> > io_uring async buffered read path handle short reads. Details below.
> 
> It's a fio issue. The io_uring engine uses a different path for short
> IO completions, and that's being ignored by the backend... Hence the
> IO just gets completed and not retried for this case, and that'll then
> trigger verification as if it did complete. I'm fixing it up.

I just updated fio to:

cb7d7abb (HEAD -> master, origin/master, origin/HEAD) io_u: set
io_u->verify_offset in fill_io_u()

The workload still reports corruption almost instantly. Only this
time, the trace is not reporting a short read.

File is patterned with:

verify_pattern=0x33333333%o-16

Offset of "bad" data is 0x1240000.

Expected:

00000000:  33 33 33 33 00 10 24 01 00 00 00 00 f0 ff ff ff 3333............
00000010:  33 33 33 33 00 10 24 01 00 00 00 00 f0 ff ff ff 3333............
00000020:  33 33 33 33 00 10 24 01 00 00 00 00 f0 ff ff ff 3333............
00000030:  33 33 33 33 00 10 24 01 00 00 00 00 f0 ff ff ff 3333............
00000040:  33 33 33 33 00 10 24 01 00 00 00 00 f0 ff ff ff 3333............
00000050:  33 33 33 33 00 10 24 01 00 00 00 00 f0 ff ff ff 3333............
00000060:  33 33 33 33 00 10 24 01 00 00 00 00 f0 ff ff ff 3333............
00000070:  33 33 33 33 00 10 24 01 00 00 00 00 f0 ff ff ff 3333............
00000080:  33 33 33 33 00 10 24 01 00 00 00 00 f0 ff ff ff 3333............
.....
0000ffd0:  33 33 33 33 00 10 24 01 00 00 00 00 f0 ff ff ff  3333............
0000ffe0:  33 33 33 33 00 10 24 01 00 00 00 00 f0 ff ff ff  3333............
0000fff0:  33 33 33 33 00 10 24 01 00 00 00 00 f0 ff ff ff  3333............

Received:

00000000:  33 33 33 33 00 00 24 01 00 00 00 00 f0 ff ff ff 3333............
00000010:  33 33 33 33 00 00 24 01 00 00 00 00 f0 ff ff ff 3333............
00000020:  33 33 33 33 00 00 24 01 00 00 00 00 f0 ff ff ff 3333............
00000030:  33 33 33 33 00 00 24 01 00 00 00 00 f0 ff ff ff 3333............
00000040:  33 33 33 33 00 00 24 01 00 00 00 00 f0 ff ff ff 3333............
00000050:  33 33 33 33 00 00 24 01 00 00 00 00 f0 ff ff ff 3333............
00000060:  33 33 33 33 00 00 24 01 00 00 00 00 f0 ff ff ff 3333............
00000070:  33 33 33 33 00 00 24 01 00 00 00 00 f0 ff ff ff 3333............
00000080:  33 33 33 33 00 00 24 01 00 00 00 00 f0 ff ff ff 3333............
.....
0000ffd0:  33 33 33 33 00 00 24 01 00 00 00 00 f0 ff ff ff  3333............
0000ffe0:  33 33 33 33 00 00 24 01 00 00 00 00 f0 ff ff ff  3333............
0000fff0:  33 33 33 33 00 00 24 01 00 00 00 00 f0 ff ff ff  3333............

Looks like the data in the expected buffer is wrong - the data
pattern in the received buffer is correct according the defined
pattern.

Error is 100% reproducable from the same test case. Same bad byte in
the expected buffer dump every single time.

-Dave.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.