On Fri, 2008-12-19 at 08:03 +0100, Nick Piggin wrote: > On Thu, Dec 18, 2008 at 10:54:57PM -0800, Dave Hansen wrote: > > On Fri, 2008-12-19 at 07:19 +0100, Nick Piggin wrote: > > > @@ -369,24 +283,34 @@ static int mnt_make_readonly(struct vfsm > > > { > > > int ret = 0; > > > > > > - lock_mnt_writers(); > > > + spin_lock(&vfsmount_lock); > > > + mnt->mnt_flags |= MNT_WRITE_HOLD; > > > /* > > > - * With all the locks held, this value is stable > > > + * After storing MNT_WRITE_HOLD, we'll read the counters. This store > > > + * should be visible before we do. > > > */ > > > - if (atomic_read(&mnt->__mnt_writers) > 0) { > > > + smp_mb(); > > > + > > > + /* > > > + * With writers on hold, if this value is zero, then there are definitely > > > + * no active writers (although held writers may subsequently increment > > > + * the count, they'll have to wait, and decrement it after seeing > > > + * MNT_READONLY). > > > + */ > > > + if (count_mnt_writers(mnt) > 0) { > > > ret = -EBUSY; > > > > OK, I think this is one of the big races inherent with this approach. > > There's nothing in here to ensure that no one is in the middle of an > > update during this code. The preempt_disable() will, of course, reduce > > the window, but I think there's still a race here. > > MNT_WRITE_HOLD is set, so any writer that has already made it past > the MNT_WANT_WRITE loop will have its count visible here. Any writer > that has not made it past that loop will wait until the slowpath > completes and then the fastpath will go on to check whether the > mount is still writeable. Ahh, got it. I'm slowly absorbing the barriers. Not the normal way, I code. I thought there was another race with MNT_WRITE_HOLD since mnt_flags isn't really managed atomically. But, by only modifying with the vfsmount_lock, I think it is OK. I also wondered if there was a possibility of getting a spurious -EBUSY when remounting r/w->r/o. But, that turned out to just happen when the fs was *already* r/o. So that looks good. While this has cleared out a huge amount of complexity, I can't stop wondering if this could be done with a wee bit more "normal" operations. I'm pretty sure I couldn't have come up with this by myself, and I'm a bit worried that I wouldn't be able to find a race in it if one reared its ugly head. Is there a real good reason to allocate the percpu counters dynamically? Might as well stick them in the vfsmount and let the one kmem_cache_zalloc() in alloc_vfsmnt() do a bit larger of an allocation. Did you think that was going to bloat it to a compound allocation or something? I hate the #ifdefs. :) -- Dave -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html