On 07/24/13 19:21, Dave Chinner wrote:
On Wed, Jul 24, 2013 at 08:28:42AM -0500, Mark Tinguely wrote:
If you could please redo the test and get the stack traces with
/proc/sysrq-trigger and if you kernel works with crash, a core dump.
For the stack trace, I mostly want to know if it has several
"xlog_grant_head_wait" entries in it, because ...
...I seemed to have triggered a couple log space reservation hangs
with fsstress one XFS partition and a mega-copy on another
partition, but will have to graft the new XFS tree onto a Linux 3.10
kernel to get crash (and one of my sata controllers) to work again.
They are unrelated to this patchset.
Somewhere in the code there
is a mismatch between what we reserve as the base requirement for an
actual log write and what the CIL actually steals, and that is, most
likely, what is leading to log hangs.
This is demonstratable in the fact that generic/070 on 512 byte
block size filesystems regularly hits a transaction reservation
exhausted assert failure on transaction commit of the periodic log
dummy transaction on my test rigs.
Cheers,
Dave.
In testing patch 44, I did not trip over any cil stealing asserts before
the hang. I think the cil steal assert is a different and a legitimate
complaint. When I tripped over the ASSERT in with the v3 inode enabled,
the writeid only reserves space for the sb but there were occasions of
root btree and attribute fork entry that were also logged.
patch 43 runs for hours without incident. Previous to this series, I ran
the same tests with parent pointer testing with much higher log
reservations for day or two and never got a hang.
I tested patch 44 with copy like tests and both times it hung both times
- not a convincing number of tests. A quick look, I see an empty AIL,
empty CIL, the CTX is using 0 bytes, doesn't look like there are any cil
pushes going nor any older ctx, the ctx has an empty ticket reservation.
The log tail is 0xd000014d7 and reserve/grant is 0xe00204d04. The next
reservation is for a rename transaction that uses just over the log
space left. There has to be a log space leak. I will go back patch 43 on
one machine and patch 44 on another and make sure it is patch 44 is
causing the problem.
--Mark.
_______________________________________________
xfs mailing list
xfs@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/xfs