Re: RFC: Data pattern buffer filling race condition fix

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Nov 8, 2010 at 2:03 PM, Jens Axboe <jaxboe@xxxxxxxxxxxx> wrote:
>
> On 2010-11-07 13:58, Bart Van Assche wrote:
> > On Sun, Nov 7, 2010 at 12:43 PM, Jens Axboe <jaxboe@xxxxxxxxxxxx> wrote:
> >>
> >> On 2010-11-06 10:35, Bart Van Assche wrote:
> >>> On multicore non-x86 CPUs fio has been observed to frequently reports false
> >>> data verification failures with I/O engine libaio and I/O depths above one.
> >>> This is because of a race condition in the function fill_pattern(). The code
> >>> in that function only works correct if all CPUs of a multicore system
> >>> observe store instructions in the order they were issued. That is the case for
> >>> multicore x86 systems but not for all other CPU families, such as e.g. the
> >>> POWER CPU family.
> >>>
> >>> [ ... ]
>
> Forgive me, but I'm still a little confused. This second write_barrier()
> is now protecting against the order of the fill and the length
> assignment. IOW, if you see the new length, you are guaranteed to also
> see the new content. This means that the first memory barrier should be
> a read_barrier().
>
> And ditto for the other case.
>
> Can you verify whether that works as expected and send an updated patch?

Hello Jens,

I'm afraid that I will have to do more testing and that I'll have to
make sure that I understand the entire fio code base before I can
develop and send a new patch - something I do not have the time for
now unfortunately. I ran into this issue on 32-bit 2.6.34.7 kernel
while running a test on a local ext3 filesystem, something I will have
to analyze further before I can proceed:

$ valgrind ./fio --ioengine=libaio --overwrite=1 --verify=md5
--iodepth=10 --direct=1 --loops=10 --size=1MB --name=test --thread
--numjobs=10 --group_reporting
==13318== Memcheck, a memory error detector
==13318== Copyright (C) 2002-2010, and GNU GPL'd, by Julian Seward et al.
==13318== Using Valgrind-3.7.0.SVN and LibVEX; rerun with -h for copyright info
==13318== Command: ./fio --ioengine=libaio --overwrite=1 --verify=md5
--iodepth=10 --direct=1 --loops=10 --size=1MB --name=test --thread
--numjobs=10 --group_reporting
==13318==
test: (g=0): rw=read, bs=4K-4K/4K-4K, ioengine=libaio, iodepth=10
...
test: (g=0): rw=read, bs=4K-4K/4K-4K, ioengine=libaio, iodepth=10
Starting 10 threads
verify: bad magic header 0, wanted f00baaef at file test.9.0 offset 0, length 0
verify: bad magic header 0, wanted f00baaef at file test.2.0 offset 0, length 0
verify: bad magic header 0, wanted f00baaef at file test.3.0 offset 0, length 0
verify: bad magic header 0, wanted f00baaef at file test.3.0 offset
4096, length 0
verify: bad magic header 0, wanted f00baaef at file test.8.0 offset 0, length 0
verify: bad magic header 0, wanted f00baaef at file test.8.0 offset
4096, length 0
verify: bad magic header 0, wanted f00baaef at file test.2.0 offset
4096, length 0
verify: bad magic header 0, wanted f00baaef at file test.1.0 offset 0, length 0
verify: bad magic header 0, wanted f00baaef at file test.1.0 offset
4096, length 0
verify: bad magic header 0, wanted f00baaef at file test.6.0 offset 0, length 0
verify: bad magic header 0, wanted f00baaef at file test.6.0 offset
4096, length 0
verify: bad magic header 0, wanted f00baaef at file test.5.0 offset 0, length 0
verify: bad magic header 0, wanted f00baaef at file test.5.0 offset
4096, length 0
verify: bad magic header 0, wanted f00baaef at file test.4.0 offset 0, length 0
verify: bad magic header 0, wanted f00baaef at file test.4.0 offset
4096, length 0
verify: bad magic header 0, wanted f00baaef at file test.7.0 offset 0, length 0
verify: bad magic header 0, wanted f00baaef at file test.7.0 offset
4096, length 0
verify: bad magic header 0, wanted f00baaef at file test.10.0 offset 0, length 0
verify: bad magic header 0, wanted f00baaef at file test.10.0 offset
4096, length 0
verify: bad magic header 0, wanted f00baaef at file test.9.0 offset
4096, length 0
fio: pid=13318, err=84/file:io_u.c:1346, func=io_u_queued_complete,
error=Invalid or incomplete multibyte or wide character
fio: pid=13318, err=84/file:io_u.c:1346, func=io_u_queued_complete,
error=Invalid or incomplete multibyte or wide character
fio: pid=13318, err=84/file:io_u.c:1346, func=io_u_queued_complete,
error=Invalid or incomplete multibyte or wide character
fio: pid=13318, err=84/file:io_u.c:1346, func=io_u_queued_complete,
error=Invalid or incomplete multibyte or wide character
fio: pid=13318, err=84/file:io_u.c:1346, func=io_u_queued_complete,
error=Invalid or incomplete multibyte or wide character
fio: pid=13318, err=84/file:io_u.c:1346, func=io_u_queued_complete,
error=Invalid or incomplete multibyte or wide character
fio: pid=13318, err=84/file:io_u.c:1346, func=io_u_queued_complete,
error=Invalid or incomplete multibyte or wide character
fio: pid=13318, err=84/file:io_u.c:1346, func=io_u_queued_complete,
error=Invalid or incomplete multibyte or wide character
fio: pid=13318, err=84/file:io_u.c:1346, func=io_u_queued_complete,
error=Invalid or incomplete multibyte or wide character
fio: pid=13318, err=84/file:io_u.c:1346, func=io_u_queued_complete,
error=Invalid or incomplete multibyte or wide character

test: (groupid=0, jobs=10): err=84 (file:io_u.c:1346,
func=io_u_queued_complete, error=Invalid or incomplete multibyte or
wide character): pid=13318
  read : io=81920 B, bw=576901 B/s, iops=704 , runt=   142msec
    slat (usec): min=26 , max=6642 , avg=40.86, stdev=21.59
    clat (msec): min=1 , max=99 , avg=41.51, stdev= 6.25
     lat (msec): min=1 , max=100 , avg=41.56, stdev= 6.25
  cpu          : usr=73.15%, sys=20.23%, ctx=598, majf=0, minf=1405
  IO depths    : 1=10.0%, 2=20.0%, 4=40.0%, 8=30.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued r/w/d: total=100/0/0, short=0/0/0

     lat (msec): 2=1.00%, 4=3.00%, 10=1.00%, 50=6.00%, 100=9.00%

Run status group 0 (all jobs):
   READ: io=80KB, aggrb=563KB/s, minb=576KB/s, maxb=576KB/s,
mint=142msec, maxt=142msec

Disk stats (read/write):
  sda: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=-nan%
==13318==
==13318== HEAP SUMMARY:
==13318==     in use at exit: 18,102 bytes in 114 blocks
==13318==   total heap usage: 370 allocs, 256 frees, 849,743 bytes allocated
==13318==
==13318== LEAK SUMMARY:
==13318==    definitely lost: 2,401 bytes in 40 blocks
==13318==    indirectly lost: 15,680 bytes in 70 blocks
==13318==      possibly lost: 0 bytes in 0 blocks
==13318==    still reachable: 21 bytes in 4 blocks
==13318==         suppressed: 0 bytes in 0 blocks
==13318== Rerun with --leak-check=full to see details of leaked memory
==13318==
==13318== For counts of detected and suppressed errors, rerun with: -v
==13318== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 5 from 5)

Bart.
--
To unsubscribe from this list: send the line "unsubscribe fio" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel]     [Linux SCSI]     [Linux IDE]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux SCSI]

  Powered by Linux