OK, sounds like 345 and 359 are the ones desired. I did have overwrite=1 in my original test, but tried to remove as many non-essential parameters as possible. But I just added it back: # fio --name=DI_Stress --ioengine=libaio --direct=1 --rw=randwrite --norandommap --randrepeat=0 --iodepth=16 --size=100% --numjobs=1 --bs=2048k --filename=/dev/nvme1n1 --output=DI_Stress --verify=crc32c-intel --verify_fatal=1 --verify_dump=1 --verify_backlog=32768 --overwrite=1 crc32c: verify failed at file /dev/nvme1n1 offset 2013628727296, length 2097152] Expected CRC: a6dbc3dc Received CRC: dfa3362b received data dumped as nvme1n1.2013628727296.received expected data dumped as nvme1n1.2013628727296.expected Regards, Jeff -----Original Message----- From: Sitsofe Wheeler [mailto:sitsofe@xxxxxxxxx] Sent: Monday, August 7, 2017 3:25 PM To: Jeff Furlong <jeff.furlong@xxxxxxx> Cc: Jens Axboe <axboe@xxxxxxxxx>; fio@xxxxxxxxxxxxxxx; Rebecca Cran <rebecca@xxxxxxxxxxxx>; Tomohiro Kusumi <tkusumi@xxxxxxxxxx> Subject: Re: Pending fio 3.0 release On 7 August 2017 at 23:02, Jeff Furlong <jeff.furlong@xxxxxxx> wrote: > Are the 3 serialize changes (pull requests 343/345/359) controversial or not complete? I hit the same issue in April with a similar workload: A fair question Jeff. If memory serves https://github.com/axboe/fio/pull/343 ("Add serialize_overlap option") is controversial because it took a heavy handed with potentially high overhead due to checking every I/O against every other inflight I/O to determine if it should flush to avoid generating two in-flight I/Os that cover overlapping regions. To use it you had to set an explicit option (which defaulted to off) and did the job though... https://github.com/axboe/fio/pull/345 is hopefully uncontroversial as it just removes an optimisation that could turn off overlap checking when it really needed for jobs that can write the same region twice and would otherwise lead to spurious mismatches at verification time. https://github.com/axboe/fio/pull/359 narrowed the window that was creating a double free situation when two overlapping inflight writes occurred. It's imperfect but it made the problem I was seeing go away even if I can theorise it won't prevent the problem in every case. > > # fio --name=DI_Stress --ioengine=libaio --direct=1 --rw=randwrite > --norandommap --randrepeat=0 --iodepth=16 --size=100% --numjobs=1 > --bs=2048k --filename=/dev/nvme1n1 --output=DI_Stress > --verify=crc32c-intel --verify_fatal=1 --verify_dump=1 > --verify_backlog=32768 > > I tried again now with fio 2.99 and reproduced after a few minutes: > > verify: bad header numberio 4531, wanted 4532 at file /dev/nvme1n1 offset 2643506233344, length 2097152 > hdr_fail data dumped as nvme1n1.2643506233344.hdr_fail > > So any large block random write seems to be at risk (larger the block, higher the risk). If there are other parameters I should be setting to avoid the issue, please let me know. Thanks. If you set --overwrite=1 does the problem you're seeing go away? If so it sounds like you're hitting https://github.com/axboe/fio/issues/335 ... -- Sitsofe | http://sucs.org/~sits/ ��.n��������+%������w��{.n�������^n�r������&��z�ޗ�zf���h���~����������_��+v���)ߣ�