Re: [PATCH 2/2] Adds check for rand_seed during verify phase.

Jens Axboe <axboe@xxxxxxxxx> · Tue, 27 Aug 2013 11:02:09 -0600

On 08/26/2013 04:11 PM, Grant Grundler wrote:
> Jens, et al,
> 
> In general I'm ok with this patch...just one issue I'd like to get an
> opinion on...
> 
> On Mon, Aug 26, 2013 at 2:22 PM, Juan Casse <jcasse@xxxxxxxxxxxx> wrote:
>> Improve data integrity checking and detect stale blocks from previous
>> runs.
>>
>> Signed-off-by: Juan Casse <jcasse@xxxxxxxxxxxx>
>> Reviewed-by: Grant Grundler <grundler@xxxxxxxxxxxx>
>> ---
>>  verify.c | 45 ++++++++++++++++++++++++++++++++++++---------
>>  1 file changed, 36 insertions(+), 9 deletions(-)
>>
>> diff --git a/verify.c b/verify.c
>> index 83c5735..deabf76 100644
>> --- a/verify.c
>> +++ b/verify.c
>> @@ -648,18 +648,17 @@ static int verify_header(struct io_u *io_u, struct verify_header *hdr)
>>         uint32_t crc;
>>
>>         if (hdr->magic != FIO_HDR_MAGIC)
>> -               return 0;
>> -       if (hdr->len > io_u->buflen) {
>> -               log_err("fio: verify header exceeds buffer length (%u > %lu)\n", hdr->len, io_u->buflen);
>> -               return 0;
>> -       }
>> +               return 1;
>> +       if (hdr->len > io_u->buflen)
>> +               return 2;
>> +       if (hdr->rand_seed != io_u->rand_seed)
>> +               return 3;
>>
>>         crc = fio_crc32c(p, offsetof(struct verify_header, crc32));
>>         if (crc == hdr->crc32)
>> -               return 1;
>> -
>> +               return 0;
>>         log_err("fio: verify header crc %x, calculated %x\n", hdr->crc32, crc);
>> -       return 0;
>> +       return 4;
>>  }
>>
>>  int verify_io_u(struct thread_data *td, struct io_u *io_u)
>> @@ -696,13 +695,41 @@ int verify_io_u(struct thread_data *td, struct io_u *io_u)
>>                         memswp(p, p + td->o.verify_offset, header_size);
>>                 hdr = p;
>>
>> -               if (!verify_header(io_u, hdr)) {
>> +               ret = verify_header(io_u, hdr);
>> +               switch (ret) {
>> +               case 0:
>> +                       break;
>> +               case 1:
>>                         log_err("verify: bad magic header %x, wanted %x at "
>>                                 "file %s offset %llu, length %u\n",
>>                                 hdr->magic, FIO_HDR_MAGIC,
>>                                 io_u->file->file_name,
>>                                 io_u->offset + hdr_num * hdr->len, hdr->len);
>>                         return EILSEQ;
>> +                       break;
>> +               case 2:
>> +                       log_err("fio: verify header exceeds buffer length (%u "
>> +                               "> %lu)\n", hdr->len, io_u->buflen);
>> +                       return EILSEQ;
> 
> Should we return different error values for each different possible mismatch?
> 
> I think we should but don't know which error values would be preferred here.
> 
> The reason is automated testing will log a failure and I want a
> different failure code for each possible failure mode. That way, I can
> easily tell if a different data integrity check fails and which
> systems are reporting a particular error code.
> 
> I understand, some error codes may have many different root causes.
> But I should be able to tell the difference between "magic number is
> wrong", "IO Timed out", and "generation number is wrong" at the first
> cut.

Right now fio will return EILSEQ for any verification failure. If you
time out, then no, you should not get that. If we do more fine grained
errors for verify failures, then we would have to get creative on what
to use. And not sure I'm a huge fan of that. If you do get a
verification failure, though, at least the runtime log should tell you
EXACTLY what it was.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe fio" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html