On Thu, Jul 25, 2013 at 10:02:40AM -0500, Mark Tinguely wrote: > On 07/24/13 19:21, Dave Chinner wrote: > >On Wed, Jul 24, 2013 at 08:28:42AM -0500, Mark Tinguely wrote: > >>If you could please redo the test and get the stack traces with > >>/proc/sysrq-trigger and if you kernel works with crash, a core dump. > >>For the stack trace, I mostly want to know if it has several > >>"xlog_grant_head_wait" entries in it, because ... > >> > >>...I seemed to have triggered a couple log space reservation hangs > >>with fsstress one XFS partition and a mega-copy on another > >>partition, but will have to graft the new XFS tree onto a Linux 3.10 > >>kernel to get crash (and one of my sata controllers) to work again. > > > >They are unrelated to this patchset. > > > >Somewhere in the code there > >is a mismatch between what we reserve as the base requirement for an > >actual log write and what the CIL actually steals, and that is, most > >likely, what is leading to log hangs. > > > >This is demonstratable in the fact that generic/070 on 512 byte > >block size filesystems regularly hits a transaction reservation > >exhausted assert failure on transaction commit of the periodic log > >dummy transaction on my test rigs. > > > >Cheers, > > > >Dave. > > In testing patch 44, I did not trip over any cil stealing asserts > before the hang. I think the cil steal assert is a different and a > legitimate complaint. When I tripped over the ASSERT in with the v3 > inode enabled, the writeid only reserves space for the sb but there > were occasions of root btree and attribute fork entry that were also > logged. > > patch 43 runs for hours without incident. Previous to this series, I > ran the same tests with parent pointer testing with much higher log > reservations for day or two and never got a hang. > > I tested patch 44 with copy like tests and both times it hung both > times - not a convincing number of tests. A quick look, I see an > empty AIL, empty CIL, the CTX is using 0 bytes, doesn't look like > there are any cil pushes going nor any older ctx, the ctx has an > empty ticket reservation. The log tail is 0xd000014d7 and > reserve/grant is 0xe00204d04. The next reservation is for a rename > transaction that uses just over the log space left. There has to be > a log space leak. I will go back patch 43 on one machine and patch > 44 on another and make sure it is patch 44 is causing the problem. Right, a patch that makes transaction commits go faster is likely to cause a pre-existing reservation leak to leak faster.... Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs