As written the verify function doesn't appear to have logic for
detecting dropped-writes (stale data) for data at rest. There are only
two temporally-variant fields presently utilized in the verify pattern:
verify_header.rand_seed
verify_header.numberio
These fields are verified during read+write invocations but not for
read-only invocations. This means any dropped data for the most recent
write to a given block won't be detected because all the non-temporally
variant fields will pass verification. This is particularly problematic
when reusing a device for separate fio invocations during a series of
tests, as there will be valid but stale data at rest from previous
invocations.
For example, if a user does the following after previous fio invocations:
1) Performs a write workload, without verify. When complete, runs a
subsequent invocation with a read/verify-only workload against the same
dataset.
2) Performs a write workload and use a trigger to perform a
power-interruption test. Run a subsequent invocation with a
read/verify-only workload, using verify_state_load=1.
It could be argued the onus is on the user to wipe data before every
invocation but I'm not sure that's reasonable.
I'd like to implement an invocation-variant check that will catch the
case of any data at rest stale relative to previous invocations. There
would be an invocation-unique identifier, either passed via a
command-line option or generated randomly. It would be added to
verify_header and checked during all verify-reads. To support its use
for subsequent read-only invocations it would be added to the
verify_state file and used whenever verify_state_load=1. It would also
be utilized when the identifier is specified on the command line.
An alternative would be to use the existing verify_header.time_sec field
and check for any blocks older than the start time of the most recent
invocation time that we'd encode in the state file. This would make a
command-line option for specifying the time a little more cumbersome
than an opaque identifier.
Note this wont catch missed multiple writes within a given invocation as
that would require a block-specific sidecar map that tracks write counts
per block (or stores a subset of the hash for the most recent write for
each block). I've implemented such a feature in a proprietary tool and
would consider it for fio if there's interest. The downside is the
creation and dependency of a large side-car file. The upside is it would
add verification support for sparsely-random workloads.
Code references for the temporal-variant field not being used for
read-only workloads:
verify_io_u() forces the seeds to match the header's seed when !td_rw():
/*
* Make rand_seed check pass when have verify_backlog or
* zone reset frequency for zonemode=zbd.
*/
if (!td_rw(td) || (td->flags & TD_F_VER_BACKLOG) ||
td->o.zrf.u.f)
io_u->rand_seed = hdr->rand_seed;
verify_header() bypasses numberio check for read-only invocations:
/*
* For read-only workloads, the program cannot be certain of the
* last numberio written to a block. Checking of numberio will be
* done only for workloads that write data. For verify_only,
* numberio check is skipped.
*/
if (td_write(td) && (td_min_bs(td) == td_max_bs(td)) &&
!td->o.time_based)
if (!td->o.verify_only)
if (hdr->numberio != io_u->numberio) {
log_err("verify: bad header numberio %"PRIu16
", wanted %"PRIu16,
hdr->numberio, io_u->numberio);
goto err;
}
Adam (horshack@xxxxxxxx)