On Fri, 13 Jul 2012, Mike Galbraith wrote: > On Thu, 2012-07-12 at 15:31 +0200, Thomas Gleixner wrote: > > Bingo, that makes it more likely that this is caused by copying w/o > > initializing the lock and then freeing the original structure. > > > > A quick check for memcpy finds that __btrfs_close_devices() does a > > memcpy of btrfs_device structs w/o initializing the lock in the new > > copy, but I have no idea whether that's the place we are looking for. > > Thanks a bunch Thomas. I doubt I would have ever figured out that lala > land resulted from _copying_ a lock. That's one I won't be forgetting > any time soon. Box not only survived a few thousand xfstests 006 runs, > dbench seemed disinterested in deadlocking virgin 3.0-rt. Cute. It think that the lock copying caused the deadlock problem as the list pointed to the wrong place, so we might have ended up with following down the wrong chain when walking the list as long as the original struct was not freed. That beast is freed under RCU so there could be a rcu read side critical section fiddling with the old lock and cause utter confusion. /me goes and writes a nastigram^W proper changelog > btrfs still locks up in my enterprise kernel, so I suppose I had better > plug your fix into 3.4-rt and see what happens, and go beat hell out of > virgin 3.0-rt again to be sure box really really survives dbench. A test against 3.4-rt sans enterprise mess might be nice as well. Thanks, tglx -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html