Re: FIO verify bad header question

Sitsofe Wheeler <sitsofe@xxxxxxxxx> · Wed, 15 Jan 2020 23:47:29 +0000

Hi,

On Wed, 15 Jan 2020 at 15:32, Jens Axboe <axboe@xxxxxxxxx> wrote:
>
> Please use the fio list for this, CC'ed.
>
> On 1/15/20 8:28 AM, Michael Chichik wrote:
> >
> > */usr/bin/fio --time_based --runtime=1d --ioengine=libaio --iodepth=32 --rw=randrw --bs=24k --direct=0 --io_size=60G --numjobs=10 --size=1G --rwmixread=0 --verify=md5 --verify_fatal=1 --output-format=json --output=4_randrw_0_10_32_libaio_xfs_1G.json --directory /mnt/test-vol-23206-36 --name host1_test-vol-23206-36

^^^numjobs=10 will mean all the jobs are working inside the same
region. Because you're using randrw it's likely each job will write
blocks in a different sequence to other jobs.

> > *
> > FIO failed with the following error:
> > *
> > *
> > *verify: bad header rand_seed 1089408830752521594, wanted 7280637923435198810 at file /mnt/test-vol-23206-36/filename offset 776945664, length 24576

This means job 2 can interfere with the writes of job 1 so when job
one comes to do a verify it can't find the blocks written by itself
(because it may find job 2's data). If you're going to use numjobs I'd
strongly look into use offset_increment
(https://fio.readthedocs.io/en/latest/fio_man.html#cmdoption-arg-offset-increment
) and size (https://fio.readthedocs.io/en/latest/fio_man.html#cmdoption-arg-size
) to ensure each job is working within a different region to any other
job.

> > *
> >
> > but when we compare the block level data is consistent.
> >
> > what is the meaning of the Error? does it really indicate data corruption?

It means the data a given job had written isn't there and if no one
else has interfered with the data that usually means corruption (but
in this case other jobs may be interfering). Also using randrw with
verify makes things complicated because are you expecting it to verify
data it might try to read in the first pass which that job has never
written?

--
Sitsofe | http://sucs.org/~sits/