Data Integrity Test with fio

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Jens:

I wanted to let you know what we intend to do in advance in case you
can foresee any problems or have any preferences.

We would like fio to test for data integrity/retention as follows.

Phase 1: Write data to storage device
Run fio with a job file specifying the following options.
readwrite=
 - randwrite
   (does random writes)
 - randrw
   (does random reads and writes)
size=64k or any other size
(specifies the total amount of data that will be written/read)
bs=8192 or any other size
(specifies the size of each data block)
runtime=1 or any other number of seconds
(specifies the duration of the fio run)
time_based
(tells fio to run based on time rather than iops.)
verify=meta
(adds block number, numberio and timestamp to the block header)
verify_pattern=0xffffffffffffffff or any other pattern
(fills the rest of the block with the specified pattern)
verify_dump=1
(writes to a file the data read from disk and the data that was
expected, in case of data corruption)
continue_on_error=verify
(allows fio to continue even if corrupted blocks are found, otherwise
fio will stop execution on the first corrupted block)
random_generator=lfsr
(use lfsr as the random number generator)

Phase 2: Verify data
(Recall that we mentioned adding a "generation" number to the block
header data? Well, we think we can use the existing numberio instead.)
Run fio with the same options as before plus specifying a new option:
 - data_integrity_check
(I already modified the code to accept this option).
When this option is given in the job file, fio will "replay" the
workload (without actually writing data to storage). Once we have done
this, we can read each block back and compare its numberio with the
one obtained by running lfsr in reverse. We run lfsr in reverse
because the numberio that was last written to a block will be found
toward the end of the lfsr sequence when the data is written multiple
times, which is what we want.

The numberio is incremented each time we read or write, so it can
easily be computed going backwards by decrementing its value.

How is the block offset computed from the lfsr? Do you see any
problems trying to compute the offset going backwards with the lfsr?

One way to perform the data integrity check is to verify each block in
order of block number (offset). For each block number, run the lfsr
backwards starting from the end until we hit the block number. We then
compare the numberio obtained by running the lfsr backwards with the
one read from storage.

Any concerns with this?

In the fio code, the function do_io() performs the workload specified,
whether it be writes, reads and writes, or just reads.
The function do_verify() is executed after do_io() only when the
workload does any writes. If the workload does only reads, do_verify()
is not executed. This function reads the blocks back and compares the
offset (block number). I already have code in place that checks for
numberio as well.

However, if the job file specifies to run based on time rather than
total number of bytes (setting runtime= and time_based), then
do_verify() is not performed. We would also need to run do_verify() in
this case to make sure that the correct data was indeed written to
storage.
Note: fio needs to be run based on time if we want numberio
incremented when a block is rewritten. If we set fio to run a number
of iterations instead (by specifying loops=int), the same numberio
will be written every time the block is rewritten.

Thanks,
Juan
--
To unsubscribe from this list: send the line "unsubscribe fio" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Kernel]     [Linux SCSI]     [Linux IDE]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux SCSI]

  Powered by Linux