On Fri, 2008-12-19 at 07:19 +0100, Nick Piggin wrote: > @@ -369,24 +283,34 @@ static int mnt_make_readonly(struct vfsm > { > int ret = 0; > > - lock_mnt_writers(); > + spin_lock(&vfsmount_lock); > + mnt->mnt_flags |= MNT_WRITE_HOLD; > /* > - * With all the locks held, this value is stable > + * After storing MNT_WRITE_HOLD, we'll read the counters. This store > + * should be visible before we do. > */ > - if (atomic_read(&mnt->__mnt_writers) > 0) { > + smp_mb(); > + > + /* > + * With writers on hold, if this value is zero, then there are definitely > + * no active writers (although held writers may subsequently increment > + * the count, they'll have to wait, and decrement it after seeing > + * MNT_READONLY). > + */ > + if (count_mnt_writers(mnt) > 0) { > ret = -EBUSY; OK, I think this is one of the big races inherent with this approach. There's nothing in here to ensure that no one is in the middle of an update during this code. The preempt_disable() will, of course, reduce the window, but I think there's still a race here. Is this where you wanted to put the synchronize_rcu()? That's a nice touch because although *that* will ensure that no one is in the middle of an increment here and that they will, at worst, be blocking on the MNT_WRITE_HOLD thing. I kinda remember going down this path a few times, bu you may have cracked the problem. Dunno. I need to stare at the code a bit more before I'm convinced. I'm optimistic, but a bit skeptical this can work. :) I am really wondering where all the cost is that you're observing in those benchmarks. Have you captured any profiles by chance? -- Dave -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html