Re: [PATCH 1/2] Adds check for numberio during verify phase.

Juan Casse <jcasse@xxxxxxxxxx> · Tue, 27 Aug 2013 11:44:47 -0700

Jens,

I found an fio bug or am i not understanding how it works?

I just ran the latest fio version without my changes and it failed
with different block sizes for read and writes.

fio --version
=========
fio-2.1.2-13-g3e10

error message
============
fio: bad verify type: 0
fio: verify.c:930: populate_hdr: Assertion `0' failed.
fio: pid=18551, got signal=6

job file
=====
readwrite=rw
randrepeat=1
size=64k
bs=4k,8k
ioengine=sync
direct=1
buffered=0
rwmixread=50
rwmixwrite=50
norandommap
loops=1
verify=meta
verify_pattern=0xffffffffffffffff
verify_dump=1
continue_on_error=verify

On Tue, Aug 27, 2013 at 10:50 AM, Juan Casse <jcasse@xxxxxxxxxx> wrote:
> Jens,
>
> sync IO:
> You're right, this should work. The code change simply adds the check
> for numberio to the existing fio infrastructure. I think my first
> attempt had some problems and I did not test asynchronous io on my
> finished code. I will run some tests to make sure that it works with
> asynchronous io.
>
> equal size read/writes:
> I just realized after running a quick test that there is a problem.
> With different read/write sizes, fio exited with an error:
> fio: verify.c:960: populate_hdr: Assertion '0' failed.
> I think this has to do with the offsets of the read and write blocks
> being different when they're of different sizes. I need to figure out
> what is going on. Any ideas Jens?
>
> Here is the job used:
> readwrite=randrw
> randrepeat=1
> size=1m
> bs=4k,8k
> ioengine=libaio
> direct=1
> buffered=0
> rwmixread=30
> rwmixwrite=70
> norandommap
> loops=2
> verify=meta
> verify_pattern=0xffffffffffffffff
> verify_dump=1
> continue_on_error=verify
>
> On Tue, Aug 27, 2013 at 10:22 AM, Grant Grundler <grundler@xxxxxxxxxxxx> wrote:
>> On Tue, Aug 27, 2013 at 10:02 AM, Jens Axboe <axboe@xxxxxxxxx> wrote:
>> ...
>>>> +data_integrity_check
>>>> +             If this option is given, fio will check the i/o number of
>>>> +             each block read back during the verification phase. Fio
>>>> +             checks numberio to detect stale blocks. Currently, this
>>>> +             option requires synchronous i/o, and equal-sized read and
>>>> +             write blocks. This option requires workloads that write data.
>>>
>>> I think this use case is just too narrow.
>>
>> Jens.
>> This behavior should be standard behavior for the data integrity
>> checking. To debug data corruption problems we need to know if it's
>> stale data or corrupted data. Huge difference in how to track it down
>> and potential causes. I'm arguing it shouldn't even be an option.
>>
>>> Why does it require sync IO and equal read/write sizes?
>>> Can't you just replay and re-generate and compare?
>>
>> I think it's just limitations of this implementation and testing. In
>> principle I think you are right. Perhaps Juan can explain in more
>> detail.
>>
>> cheers,
>> grant
--
To unsubscribe from this list: send the line "unsubscribe fio" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html