On 8/11/20 1:05 AM, Dave Chinner wrote: > On Mon, Aug 10, 2020 at 08:19:57PM -0600, Jens Axboe wrote: >> On 8/10/20 8:00 PM, Dave Chinner wrote: >>> On Mon, Aug 10, 2020 at 07:08:59PM +1000, Dave Chinner wrote: >>>> On Mon, Aug 10, 2020 at 05:08:07PM +1000, Dave Chinner wrote: >>>>> [cc Jens] >>>>> >>>>> [Jens, data corruption w/ io_uring and simple fio reproducer. see >>>>> the bz link below.] >>> >>> Looks like a io_uring/fio bugs at this point, Jens. All your go fast >>> bits turns the buffered read into a short read, and neither fio nor >>> io_uring async buffered read path handle short reads. Details below. >> >> It's a fio issue. The io_uring engine uses a different path for short >> IO completions, and that's being ignored by the backend... Hence the >> IO just gets completed and not retried for this case, and that'll then >> trigger verification as if it did complete. I'm fixing it up. > > I just updated fio to: > > cb7d7abb (HEAD -> master, origin/master, origin/HEAD) io_u: set io_u->verify_offset in fill_io_u() > > The workload still reports corruption almost instantly. Only this > time, the trace is not reporting a short read. > > File is patterned with: > > verify_pattern=0x33333333%o-16 > > Offset of "bad" data is 0x1240000. > > Expected: > > 00000000: 33 33 33 33 00 10 24 01 00 00 00 00 f0 ff ff ff 3333............ > 00000010: 33 33 33 33 00 10 24 01 00 00 00 00 f0 ff ff ff 3333............ > 00000020: 33 33 33 33 00 10 24 01 00 00 00 00 f0 ff ff ff 3333............ > 00000030: 33 33 33 33 00 10 24 01 00 00 00 00 f0 ff ff ff 3333............ > 00000040: 33 33 33 33 00 10 24 01 00 00 00 00 f0 ff ff ff 3333............ > 00000050: 33 33 33 33 00 10 24 01 00 00 00 00 f0 ff ff ff 3333............ > 00000060: 33 33 33 33 00 10 24 01 00 00 00 00 f0 ff ff ff 3333............ > 00000070: 33 33 33 33 00 10 24 01 00 00 00 00 f0 ff ff ff 3333............ > 00000080: 33 33 33 33 00 10 24 01 00 00 00 00 f0 ff ff ff 3333............ > ..... > 0000ffd0: 33 33 33 33 00 10 24 01 00 00 00 00 f0 ff ff ff 3333............ > 0000ffe0: 33 33 33 33 00 10 24 01 00 00 00 00 f0 ff ff ff 3333............ > 0000fff0: 33 33 33 33 00 10 24 01 00 00 00 00 f0 ff ff ff 3333............ > > > Received: > > 00000000: 33 33 33 33 00 00 24 01 00 00 00 00 f0 ff ff ff 3333............ > 00000010: 33 33 33 33 00 00 24 01 00 00 00 00 f0 ff ff ff 3333............ > 00000020: 33 33 33 33 00 00 24 01 00 00 00 00 f0 ff ff ff 3333............ > 00000030: 33 33 33 33 00 00 24 01 00 00 00 00 f0 ff ff ff 3333............ > 00000040: 33 33 33 33 00 00 24 01 00 00 00 00 f0 ff ff ff 3333............ > 00000050: 33 33 33 33 00 00 24 01 00 00 00 00 f0 ff ff ff 3333............ > 00000060: 33 33 33 33 00 00 24 01 00 00 00 00 f0 ff ff ff 3333............ > 00000070: 33 33 33 33 00 00 24 01 00 00 00 00 f0 ff ff ff 3333............ > 00000080: 33 33 33 33 00 00 24 01 00 00 00 00 f0 ff ff ff 3333............ > ..... > 0000ffd0: 33 33 33 33 00 00 24 01 00 00 00 00 f0 ff ff ff 3333............ > 0000ffe0: 33 33 33 33 00 00 24 01 00 00 00 00 f0 ff ff ff 3333............ > 0000fff0: 33 33 33 33 00 00 24 01 00 00 00 00 f0 ff ff ff 3333............ > > > Looks like the data in the expected buffer is wrong - the data > pattern in the received buffer is correct according the defined > pattern. > > Error is 100% reproducable from the same test case. Same bad byte in > the expected buffer dump every single time. What job file are you running? It's not impossible that I broken something else in fio, the io_u->verify_offset is a bit risky... I'll get it fleshed out shortly. -- Jens Axboe