Re: [PATCH] xfs: use s_umount sema in xfs_sync_worker

Ben Myers <bpm@xxxxxxx> · Thu, 24 May 2012 17:39:52 -0500

Hey Dave,

On Thu, May 17, 2012 at 05:16:58PM +1000, Dave Chinner wrote:
> On Wed, May 16, 2012 at 12:04:03PM -0500, Ben Myers wrote:
> > I think you have a good point here, but this isn't limited to transactions.
> 
> I'm pretty sure that the other async threads/wqs are safe w.r.t.
> startup and shutdown. Indeed, some of them have to run while
> mount/unmount are running so must have their own startup/shutdown
> synchronisation. Hence we need to treat each different async work on
> a case by case basis...

I'd still like to understand them all... and there are plenty of them.

> > We shouldn't even call xfs_log_need_covered without some
> > protection from xfs_fs_put_super; xfs_fs_writable doesn't cut the
> > mustard.
> 
> Sure, but that check is for a different purpose so it's no surprise
> at all that it doesn't help this case.

Yep.

> AFAICT, we simply don't care if xfs_fs_put_super() has been called -
> what matters is the state of the log at the time
> xfs_log_need_covered() is called.  i.e. xfs_log_need_covered()
> should error out if the log is being recovered or being torn down.
> Same for xfs_log_force(), xfs_log_reserve(), etc. Right now we don't
> even check if we can do the operation safely or not.

Erroring out is one option, but I suggest that we shouldn't necessarily
characterize a torn down log as an error, unless we plan to have some other
means of synchronization.  If we have some other means of synch, then it's fine
to warn or assert or error on these when the log is already torn down.

> Right now we assume that new transactions or log operations cannot
> occur (i.e they are stopped) before we tear down the log.  That
> means we either need to prevent transactions from starting once we
> start tearing down the log, or we need to stop the xfs_sync_worker
> before we tear down the log. The second is how we currently avoid
> this problem, so adding a cancel_delayed_work_sync(&mp->m_sync_work)
> to xfs_log_unmount() would probably solve the problem....

cancel_delayed_work_sync waits for all workers to finish before returning, so
that should work fine.  Is there any point in covering the log after the
unmount record has been written?  We should cancel the work before writing that
record.  Hrm.  Maybe xfs_log_need_covered handles that already..  Anyway, I'll
make some time to work on this tomorrow so I can test it over the weekend.

Regards,
	Ben

_______________________________________________
xfs mailing list
xfs@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/xfs