On Fri, May 25, 2012 at 01:03:04PM -0400, Peter Watkins wrote: > On Fri, May 25, 2012 at 2:28 AM, Juerg Haefliger <juergh@xxxxxxxxx> wrote: > >> Does your kernel have the effect of > >> > >> 0bf6a5bd4b55b466964ead6fa566d8f346a828ee xfs: convert the xfsaild > >> thread to a workqueue > > > > No. > > > > > >> c7eead1e118fb7e34ee8f5063c3c090c054c3820 xfs: revert to using a > >> kthread for AIL pushing > > > > No. > > > > > >> In particular, is this code in xfs_trans_ail_push: > >> > >> smp_wmb(); > >> xfs_trans_ail_copy_lsn(ailp, &ailp->xa_target, &threshold_lsn); > >> smp_wmb(); > > > > No. xfs_trans_ail_push looks like this: > > > > void > > xfs_trans_ail_push( > > struct xfs_ail *ailp, > > xfs_lsn_t threshold_lsn) > > { > > xfs_log_item_t *lip; > > > > lip = xfs_ail_min(ailp); > > if (lip && !XFS_FORCED_SHUTDOWN(ailp->xa_mount)) { > > if (XFS_LSN_CMP(threshold_lsn, ailp->xa_target) > 0) > > xfsaild_wakeup(ailp, threshold_lsn); > > } > > } > > > > > > FWIW, the XFS driver in my kernel is identical to the vanilla 2.6.38 > > driver. I'm still trying to get a XFS trace from a production hang. I > > do have a crash dump from a production machine with /tmp hanging. > > Would it be helpful to share that dump? > > > > ...Juerg > > It looks like the combined effect of those patches, perhaps the write > barriers, fix one log space hang. That problem exists in 2.6.38. There are a huge number of fixes to solve these problems since 2.6.38. It doesn't help us at all to test anymore on 2.6.38, especially as that kernel is not supported, and I'd suggest that you migrate production off it sooner rather than later. > Reading bug #922 I see your test case reproduces in recent kernels, so > there must be a newer problem also. Right, that's what we need to find - it appears to be a CIL stall/accounting leak, completely unrelated to all the other AIL/log space stalls that have been occurring. Last thing is that I was waiting for more information on the stall that mark T @ sgi was able to reproduce. I haven't heard anything from him since I asked for more information on May 23.... > I find the reproducer the most useful, so no need to upload the dump. At this point, running on a 3.5-rc1 kernel is what we need to get working reliably. Once we have the problems solved there, we can work out what set of patches need to be backported to 3.0-stable and other kernels to fix the problems in those supported kernels... Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs