On Fri, 12 Apr 2013 17:03:12 -0400, Theodore Ts'o <tytso@xxxxxxx> wrote: > Hi Dmitry, > > I've been noticing that the relatively new test #299 (which I didn't use > in the previous development cycle) is failing for me, both for the > current ext4 dev branch, as well as v3.9-rc5-1-g8cde7ad (the > origin/branch point from Linus's tree for the dev branch). > > Is this test passing for you, and is there some patch whic I'm missing > which addresses this? > > Thanks, > > - Ted > > > generic/299 [16:34:59][ 155.348963] fio (3364) used > greatest stack depth: 5280 bytes left > [ 156.195750] fio (3366) used greatest stack depth: 5184 bytes left > [ 156.243934] fio (3363) used greatest stack depth: 4960 bytes left > ^[[A[ 361.330343] INFO: task umount:3426 blocked for more than 120 > seconds. > [ 361.331097] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" > disables this message. > [ 361.331823] f4361d90 00000046 f043a000 c16a0ac0 c16a0ac0 75e421ae > 00000028 00000000 > [ 361.332620] 00000000 f5ba02a0 c016c753 75e41aee 00000000 f6ad4080 > 75e41739 00000028 > [ 361.333479] 00000001 00000000 f6ad4080 f4361da4 c020882b 00000000 > f6ad4080 75e40f23 > [ 361.334250] Call Trace: > [ 361.334728] [<c016c753>] ? sched_clock+0x17/0x29 > [ 361.335272] [<c020882b>] ? sched_clock_cpu+0x1e2/0x20e > [ 361.335781] [<c0f5a34e>] schedule+0xe3/0xf4 > [ 361.336182] [<c0f57361>] schedule_timeout+0x28/0x12b > [ 361.336681] [<c023ce71>] ? mark_held_locks+0xc1/0xff > [ 361.337156] [<c0f5d16d>] ? _raw_spin_unlock_irq+0x5f/0xa9 > [ 361.337652] [<c023d156>] ? trace_hardirqs_on_caller+0x2a7/0x332 > [ 361.338188] [<c023d208>] ? trace_hardirqs_on+0x27/0x37 > [ 361.338631] [<c0f5d180>] ? _raw_spin_unlock_irq+0x72/0xa9 > [ 361.339095] [<c0f59fd8>] __wait_for_common+0xfa/0x1a5 > [ 361.339534] [<c0f57339>] ? console_conditional_schedule+0x61/0x61 > [ 361.340119] [<c020628f>] ? try_to_wake_up+0x377/0x377 > [ 361.340561] [<c0f5a25a>] wait_for_completion+0x27/0x38 > [ 361.341014] [<c0372aa1>] writeback_inodes_sb_nr+0x122/0x13b > [ 361.341502] [<c0f59f38>] ? __wait_for_common+0x5a/0x1a5 > [ 361.341963] [<c0372bee>] writeback_inodes_sb+0x3a/0x4c > [ 361.342413] [<c037843a>] __sync_filesystem+0x3f/0xa8 > [ 361.342848] [<c037850e>] sync_filesystem+0x6b/0xa8 > [ 361.343274] [<c033782b>] generic_shutdown_super+0x56/0x18c > [ 361.343833] [<c0337991>] kill_block_super+0x30/0xd2 > [ 361.344418] [<c0337b0f>] deactivate_locked_super+0x3e/0xb9 > [ 361.344919] [<c0338bf3>] deactivate_super+0x69/0x7a > [ 361.345350] [<c0360827>] mntput_no_expire+0x23b/0x24e > [ 361.345795] [<c036229c>] sys_umount+0x5f4/0x60c > [ 361.346199] [<c03622d4>] sys_oldumount+0x20/0x30 > [ 361.346607] [<c0f5d668>] syscall_call+0x7/0xb > [ 361.347027] 1 lock held by umount/3426: Yes, this types of glitches are possible. Test try to stress fs very hard, sometimes IO becomes too fragmented so 'buffered-aio-verifier' looks like follows: Level Entries Logical Physical Length Flags 0/ 2 1/ 2 75 - 2140016 33412 2139942 1/ 2 1/302 75 - 2978 98945 2904 2/ 2 1/ 62 75 - 75 2617227 - 2617227 1 2/ 2 2/ 62 79 - 79 246147 - 246147 1 2/ 2 3/ 62 161 - 161 2119435 - 2119435 1 2/ 2 4/ 62 331 - 331 2077134 - 2077134 1 2/ 2 5/ 62 372 - 372 1285910 - 1285910 1 2/ 2 6/ 62 400 - 400 1285938 - 1285938 1 2/ 2 7/ 62 478 - 478 1286016 - 1286016 1 2/ 2 8/ 62 490 - 490 1286028 - 1286028 1 2/ 2 9/ 62 548 - 548 1286086 - 1286086 1 2/ 2 10/ 62 555 - 555 1286093 - 1286093 1 2/ 2 11/ 62 559 - 559 1286097 - 1286097 1 2/ 2 12/ 62 665 - 665 2105779 - 2105779 1 2/ 2 13/ 62 667 - 667 1286401 - 1286401 1 As result blktraces are also looks sub-optimal: 253,3 1 91 2.431844430 6049 Q W 19368784 + 8 [flush-253:3] 253,3 1 92 2.432439483 6049 Q W 19368912 + 8 [flush-253:3] 253,3 1 93 2.433015550 6049 Q W 19369432 + 8 [flush-253:3] 253,3 1 94 2.433562426 6049 Q W 19370184 + 8 [flush-253:3] 253,3 1 95 2.434084419 6049 Q W 19370416 + 8 [flush-253:3] 253,3 1 96 2.434692946 6049 Q W 19372064 + 8 [flush-253:3] 253,3 1 97 2.434976250 6049 Q W 19372208 + 8 [flush-253:3] IMHO it is not bad idea to have at least one test which force fs to handle very unfriendly workload. In fact, in terms of uncovered bugs, this test appeared to be the most productive for me. > [ 361.347361] #0: (&type->s_umount_key#18){++++..}, at: [<c0338bde>] > deactivate_super+0x54/0x7a > [16:40:14] [failed, exit status 1] - output mismatch (see > /root/xfstests/results/generic/299.out.bad) > --- tests/generic/299.out 2013-04-05 21:41:17.000000000 -0400 > +++ /root/xfstests/results/generic/299.out.bad 2013-04-12 > 16:40:14.678565323 -0400 > @@ -3,3 +3,6 @@ > Run fio with random aio-dio pattern > > Start fallocate/truncate loop > +./common/rc: line 2055: 3353 Segmentation fault "$@" >> Yes, this is known issue. I probably use recent fio.git/HEAD Jens does a good job on developing fio, but he tend to commit random untested crap to his git. So stability is worse than it should be. I have golden-good commit (aeb32dfccbd05) which works for me, and suggest to use it. > $seqres.full 2>&1 > +failed: '/root/xfstests/bin/fio /tmp/3152-299.fio' > +(see /root/xfstests/results/generic/299.full for details) > ... > (Run 'diff -u tests/generic/299.out > /root/xfstests/results/generic/299.out.bad' to see the entire diff) > Ran: generic/299 > Failures: generic/299 > -- > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html