On Mon, May 14, 2012 at 03:34:49PM -0500, Ben Myers wrote: > I'm still hitting this on a regular basis. Here is some analysis from a recent > crash dump which you may want to skip. The fix is at the end. .... > =================================================================== > > xfs: use s_umount sema in xfs_sync_worker > > xfs_sync_worker checks the MS_ACTIVE flag in sb->s_flags to avoid doing work > during mount and unmount. This flag can be cleared by unmount after the > xfs_sync_worker checks it but before the work is completed. Then there are problems all over the place in different filesystems if the straight MS_ACTIVE check is not sufficient. > Protect xfs_sync_worker by using the s_umount semaphore at the read level to > provide exclusion with unmount while work is progressing. I don't think that is the right fix for the given problem. The problem is, as you've stated: "Looks like the problem is that the sync worker is still running after the log has been torn down, and it calls xfs_fs_log_dummy which generates log traffic." Why did we allow a new transaction to start while/after the log was torn down? Isn't that the problem we need to fix because it leads to invalid entries in the physical log that might cause recovery failures? Further, any asynchronous worker thread that does transactions could have this same problem regardless of whether we are umounting or cleaning up after a failed mount, so it is not unique to the xfs_sync_worker.... That is, if we've already started to tear down or torn down the log, we must not allow new transactions to start. Likewise, we can't finalise tear down the log until transactions in progress have completed. Using the s_umount lock here avoids then race, but it really is a VFS level lock not an filesystem level lock) and is, IMO, just papering over the real problem.... Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs