Re: RFC: Data pattern buffer filling race condition fix

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2010-11-09 12:53, Bart Van Assche wrote:
> On Mon, Nov 8, 2010 at 2:03 PM, Jens Axboe <jaxboe@xxxxxxxxxxxx> wrote:
>>
>> On 2010-11-07 13:58, Bart Van Assche wrote:
>>> On Sun, Nov 7, 2010 at 12:43 PM, Jens Axboe <jaxboe@xxxxxxxxxxxx> wrote:
>>>>
>>>> On 2010-11-06 10:35, Bart Van Assche wrote:
>>>>> On multicore non-x86 CPUs fio has been observed to frequently reports false
>>>>> data verification failures with I/O engine libaio and I/O depths above one.
>>>>> This is because of a race condition in the function fill_pattern(). The code
>>>>> in that function only works correct if all CPUs of a multicore system
>>>>> observe store instructions in the order they were issued. That is the case for
>>>>> multicore x86 systems but not for all other CPU families, such as e.g. the
>>>>> POWER CPU family.
>>>>>
>>>>> [ ... ]
>>
>> Forgive me, but I'm still a little confused. This second write_barrier()
>> is now protecting against the order of the fill and the length
>> assignment. IOW, if you see the new length, you are guaranteed to also
>> see the new content. This means that the first memory barrier should be
>> a read_barrier().
>>
>> And ditto for the other case.
>>
>> Can you verify whether that works as expected and send an updated patch?
> 
> Hello Jens,
> 
> I'm afraid that I will have to do more testing and that I'll have to
> make sure that I understand the entire fio code base before I can
> develop and send a new patch - something I do not have the time for
> now unfortunately. I ran into this issue on 32-bit 2.6.34.7 kernel
> while running a test on a local ext3 filesystem, something I will have
> to analyze further before I can proceed:
> 
> $ valgrind ./fio --ioengine=libaio --overwrite=1 --verify=md5
> --iodepth=10 --direct=1 --loops=10 --size=1MB --name=test --thread
> --numjobs=10 --group_reporting
> ==13318== Memcheck, a memory error detector
> ==13318== Copyright (C) 2002-2010, and GNU GPL'd, by Julian Seward et al.
> ==13318== Using Valgrind-3.7.0.SVN and LibVEX; rerun with -h for copyright info
> ==13318== Command: ./fio --ioengine=libaio --overwrite=1 --verify=md5
> --iodepth=10 --direct=1 --loops=10 --size=1MB --name=test --thread
> --numjobs=10 --group_reporting
> ==13318==
> test: (g=0): rw=read, bs=4K-4K/4K-4K, ioengine=libaio, iodepth=10
> ...
> test: (g=0): rw=read, bs=4K-4K/4K-4K, ioengine=libaio, iodepth=10

This looks pretty straight forward - the file is created, but not filled
with a verifiable pattern. You want to run the workload with rw=write
at least once first, then you can use a read-only verify workload later
if you want.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe fio" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel]     [Linux SCSI]     [Linux IDE]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux SCSI]

  Powered by Linux