On Tue, Feb 18, 2014 at 03:02:52PM +0100, Jan Kara wrote: > On Wed 19-02-14 00:29:24, Dave Chinner wrote: > > OK, I suspect that there are oter problem lurking here, too. I just > > hit a problem on generic/068 on a ramdisk on XFS where a sync call > > would never complete until the writer processes were killed. fstress > > got stuck here: > > > > [222229.551097] fsstress D ffff88021bc13180 4040 5898 5896 0x00000000 > > [222229.551097] ffff8801e5c2dd68 0000000000000086 ffff880219eb1850 0000000000013180 > > [222229.551097] ffff8801e5c2dfd8 0000000000013180 ffff88011b2b0000 ffff880219eb1850 > > [222229.551097] ffff8801e5c2dd48 ffff8801e5c2de68 ffff8801e5c2de70 7fffffffffffffff > > [222229.551097] Call Trace: > > [222229.551097] [<ffffffff811db930>] ? fdatawrite_one_bdev+0x20/0x20 > > [222229.551097] [<ffffffff81ce35e9>] schedule+0x29/0x70 > > [222229.551097] [<ffffffff81ce28c1>] schedule_timeout+0x171/0x1d0 > > [222229.551097] [<ffffffff810b0eda>] ? __queue_delayed_work+0x9a/0x170 > > [222229.551097] [<ffffffff810b0b41>] ? try_to_grab_pending+0xc1/0x180 > > [222229.551097] [<ffffffff81ce434f>] wait_for_completion+0x9f/0x110 > > [222229.551097] [<ffffffff810c7810>] ? try_to_wake_up+0x2c0/0x2c0 > > [222229.551097] [<ffffffff811d3c4a>] sync_inodes_sb+0xca/0x1f0 > > [222229.551097] [<ffffffff811db930>] ? fdatawrite_one_bdev+0x20/0x20 > > [222229.551097] [<ffffffff811db94c>] sync_inodes_one_sb+0x1c/0x20 > > [222229.551097] [<ffffffff811af219>] iterate_supers+0xe9/0xf0 > > [222229.551097] [<ffffffff811dbb32>] sys_sync+0x42/0xa0 > > [222229.551097] [<ffffffff81cf0d29>] system_call_fastpath+0x16/0x1b > > > > This then held off the filesystem freeze due to holding s_umount, > > and the two fstest processes just kept running dirtying the > > filesystem. It wasn't until I kill the fstests processes by removing > > the tmp file that the sync completed and the test made progress. > OK, so flusher thread (or actually the corresponding kworker) was > continuously writing the newly dirtied data? So far I didn't reproduce this > but I'll try... No, the flusher thread was nowhere to be found. > > It's reproducable, and I left it for a couple of hours to see if > > would resolve itself. It didn't, so I had to kick it to break the > > livelock. > I wonder whether it might be some incarnation of a bug fixed here: > https://lkml.org/lkml/2014/2/14/733 > > The effects should be somewhat different but it's in that area. Can you try > with that patch? Seems to have fixed the problem. generic/068 has just passed 3 times in a row, and it's never passed before on this ramdisk based test rig. Thanks for the pointer, Jan! Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html