On Mon, 14 Apr 2014, Peter Zijlstra wrote: > On Mon, Apr 14, 2014 at 03:08:36PM -0400, Brian Foster wrote: > > On Mon, Apr 14, 2014 at 12:43:14PM -0400, Brian Foster wrote: > > > Hi all, > > > > > > This is a heads up that I'm seeing a blatant readdir hang on the current > > > for-next with selinux enabled. To reproduce, I format a clean fs, mount > > > and attempt an ls. > > > > > > The problem does not occur with selinux disabled, if I back out the > > > following commit: > > > > > > 40194ecc6d78 xfs: reinstate the ilock in xfs_readdir > > > > > > ... or if I remove the locking around xfs_attr_get(), so I suspect this > > > is another instance of a recursive deadlock. I'm getting no output > > > whatsoever in order to confirm this and it also leads to a complete > > > system lockup. It's also interesting that this hasn't been observed > > > until now, given the above commit was introduced in 3.14. So the above > > > commit doesn't appear to be the most recent change that triggers this. > > > > > > I reproduced on the latest linus tree and do not reproduce on 3.14, so > > > I'm trying to do a bisect to find out what else might have changed to > > > trigger this. > > > > > > > This bisected down to: > > > > commit 6f008e72cd111a119b5d8de8c5438d892aae99eb > > Author: Peter Zijlstra <peterz@xxxxxxxxxxxxx> > > Date: Wed Mar 12 13:24:42 2014 +0100 > > > > locking/mutex: Fix debug checks > > ... > > > > ... which suggests something down in the mutex debug code. Indeed, the > > problem no longer occurs if I disable kernel debug in my .config. What > > is also interesting is that it didn't return when I reenable > > DEBUG_KERNEL and DEBUG_MUTEXES alone. It does return when I start to > > enable some of the other lock debugging options. FWIW, I also cleared > > out my tree and rebuilt from scratch just to be sure that I didn't have > > anything stale/broken lying around. > > > > Peter, > > > > Any insight on this? > > http://lkml.kernel.org/r/tip-a227960fe0cafcc229a8d6bb8b454a3a0b33719d@xxxxxxxxxxxxxx > > That will make the kernel continue after the lockdep splat. I too see it > on some of my XFS using machines. It can happen on JFS, too, but my trusty "untar a system backup until a splat happens" test barely worked for the merge-window kernel. Therefore, I used xfstests generic/113 on XFS (kernel + xfs-oss/for-next) to cause this situation. The patch above has been through xfstests on both v4- and v5-superblock XFS, solving any new lockdep issues down here on x86. Sorry to not report it here, therefore costing you time in doing a bisect. The second lockdep splat I got after kernel 3.14 wasn't XFS, and so I treated it as a non-XFS issue. Good luck! Michael _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs