On Mon, Nov 8, 2010 at 2:03 PM, Jens Axboe <jaxboe@xxxxxxxxxxxx> wrote: > > On 2010-11-07 13:58, Bart Van Assche wrote: > > On Sun, Nov 7, 2010 at 12:43 PM, Jens Axboe <jaxboe@xxxxxxxxxxxx> wrote: > >> > >> On 2010-11-06 10:35, Bart Van Assche wrote: > >>> On multicore non-x86 CPUs fio has been observed to frequently reports false > >>> data verification failures with I/O engine libaio and I/O depths above one. > >>> This is because of a race condition in the function fill_pattern(). The code > >>> in that function only works correct if all CPUs of a multicore system > >>> observe store instructions in the order they were issued. That is the case for > >>> multicore x86 systems but not for all other CPU families, such as e.g. the > >>> POWER CPU family. > >>> > >>> [ ... ] > > Forgive me, but I'm still a little confused. This second write_barrier() > is now protecting against the order of the fill and the length > assignment. IOW, if you see the new length, you are guaranteed to also > see the new content. This means that the first memory barrier should be > a read_barrier(). > > And ditto for the other case. > > Can you verify whether that works as expected and send an updated patch? Hello Jens, I'm afraid that I will have to do more testing and that I'll have to make sure that I understand the entire fio code base before I can develop and send a new patch - something I do not have the time for now unfortunately. I ran into this issue on 32-bit 2.6.34.7 kernel while running a test on a local ext3 filesystem, something I will have to analyze further before I can proceed: $ valgrind ./fio --ioengine=libaio --overwrite=1 --verify=md5 --iodepth=10 --direct=1 --loops=10 --size=1MB --name=test --thread --numjobs=10 --group_reporting ==13318== Memcheck, a memory error detector ==13318== Copyright (C) 2002-2010, and GNU GPL'd, by Julian Seward et al. ==13318== Using Valgrind-3.7.0.SVN and LibVEX; rerun with -h for copyright info ==13318== Command: ./fio --ioengine=libaio --overwrite=1 --verify=md5 --iodepth=10 --direct=1 --loops=10 --size=1MB --name=test --thread --numjobs=10 --group_reporting ==13318== test: (g=0): rw=read, bs=4K-4K/4K-4K, ioengine=libaio, iodepth=10 ... test: (g=0): rw=read, bs=4K-4K/4K-4K, ioengine=libaio, iodepth=10 Starting 10 threads verify: bad magic header 0, wanted f00baaef at file test.9.0 offset 0, length 0 verify: bad magic header 0, wanted f00baaef at file test.2.0 offset 0, length 0 verify: bad magic header 0, wanted f00baaef at file test.3.0 offset 0, length 0 verify: bad magic header 0, wanted f00baaef at file test.3.0 offset 4096, length 0 verify: bad magic header 0, wanted f00baaef at file test.8.0 offset 0, length 0 verify: bad magic header 0, wanted f00baaef at file test.8.0 offset 4096, length 0 verify: bad magic header 0, wanted f00baaef at file test.2.0 offset 4096, length 0 verify: bad magic header 0, wanted f00baaef at file test.1.0 offset 0, length 0 verify: bad magic header 0, wanted f00baaef at file test.1.0 offset 4096, length 0 verify: bad magic header 0, wanted f00baaef at file test.6.0 offset 0, length 0 verify: bad magic header 0, wanted f00baaef at file test.6.0 offset 4096, length 0 verify: bad magic header 0, wanted f00baaef at file test.5.0 offset 0, length 0 verify: bad magic header 0, wanted f00baaef at file test.5.0 offset 4096, length 0 verify: bad magic header 0, wanted f00baaef at file test.4.0 offset 0, length 0 verify: bad magic header 0, wanted f00baaef at file test.4.0 offset 4096, length 0 verify: bad magic header 0, wanted f00baaef at file test.7.0 offset 0, length 0 verify: bad magic header 0, wanted f00baaef at file test.7.0 offset 4096, length 0 verify: bad magic header 0, wanted f00baaef at file test.10.0 offset 0, length 0 verify: bad magic header 0, wanted f00baaef at file test.10.0 offset 4096, length 0 verify: bad magic header 0, wanted f00baaef at file test.9.0 offset 4096, length 0 fio: pid=13318, err=84/file:io_u.c:1346, func=io_u_queued_complete, error=Invalid or incomplete multibyte or wide character fio: pid=13318, err=84/file:io_u.c:1346, func=io_u_queued_complete, error=Invalid or incomplete multibyte or wide character fio: pid=13318, err=84/file:io_u.c:1346, func=io_u_queued_complete, error=Invalid or incomplete multibyte or wide character fio: pid=13318, err=84/file:io_u.c:1346, func=io_u_queued_complete, error=Invalid or incomplete multibyte or wide character fio: pid=13318, err=84/file:io_u.c:1346, func=io_u_queued_complete, error=Invalid or incomplete multibyte or wide character fio: pid=13318, err=84/file:io_u.c:1346, func=io_u_queued_complete, error=Invalid or incomplete multibyte or wide character fio: pid=13318, err=84/file:io_u.c:1346, func=io_u_queued_complete, error=Invalid or incomplete multibyte or wide character fio: pid=13318, err=84/file:io_u.c:1346, func=io_u_queued_complete, error=Invalid or incomplete multibyte or wide character fio: pid=13318, err=84/file:io_u.c:1346, func=io_u_queued_complete, error=Invalid or incomplete multibyte or wide character fio: pid=13318, err=84/file:io_u.c:1346, func=io_u_queued_complete, error=Invalid or incomplete multibyte or wide character test: (groupid=0, jobs=10): err=84 (file:io_u.c:1346, func=io_u_queued_complete, error=Invalid or incomplete multibyte or wide character): pid=13318 read : io=81920 B, bw=576901 B/s, iops=704 , runt= 142msec slat (usec): min=26 , max=6642 , avg=40.86, stdev=21.59 clat (msec): min=1 , max=99 , avg=41.51, stdev= 6.25 lat (msec): min=1 , max=100 , avg=41.56, stdev= 6.25 cpu : usr=73.15%, sys=20.23%, ctx=598, majf=0, minf=1405 IO depths : 1=10.0%, 2=20.0%, 4=40.0%, 8=30.0%, 16=0.0%, 32=0.0%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% issued r/w/d: total=100/0/0, short=0/0/0 lat (msec): 2=1.00%, 4=3.00%, 10=1.00%, 50=6.00%, 100=9.00% Run status group 0 (all jobs): READ: io=80KB, aggrb=563KB/s, minb=576KB/s, maxb=576KB/s, mint=142msec, maxt=142msec Disk stats (read/write): sda: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=-nan% ==13318== ==13318== HEAP SUMMARY: ==13318== in use at exit: 18,102 bytes in 114 blocks ==13318== total heap usage: 370 allocs, 256 frees, 849,743 bytes allocated ==13318== ==13318== LEAK SUMMARY: ==13318== definitely lost: 2,401 bytes in 40 blocks ==13318== indirectly lost: 15,680 bytes in 70 blocks ==13318== possibly lost: 0 bytes in 0 blocks ==13318== still reachable: 21 bytes in 4 blocks ==13318== suppressed: 0 bytes in 0 blocks ==13318== Rerun with --leak-check=full to see details of leaked memory ==13318== ==13318== For counts of detected and suppressed errors, rerun with: -v ==13318== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 5 from 5) Bart. -- To unsubscribe from this list: send the line "unsubscribe fio" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html