Re: [rfc][patch 1/2] mnt_want_write speedup 1

Dave Hansen <dave@xxxxxxxxxxxxxxxxxx> · Fri, 19 Dec 2008 07:32:01 -0800

On Fri, 2008-12-19 at 08:03 +0100, Nick Piggin wrote:
> On Thu, Dec 18, 2008 at 10:54:57PM -0800, Dave Hansen wrote:
> > On Fri, 2008-12-19 at 07:19 +0100, Nick Piggin wrote:
> > > @@ -369,24 +283,34 @@ static int mnt_make_readonly(struct vfsm
> > >  {
> > >         int ret = 0;
> > > 
> > > -       lock_mnt_writers();
> > > +       spin_lock(&vfsmount_lock);
> > > +       mnt->mnt_flags |= MNT_WRITE_HOLD;
> > >         /*
> > > -        * With all the locks held, this value is stable
> > > +        * After storing MNT_WRITE_HOLD, we'll read the counters. This store
> > > +        * should be visible before we do.
> > >          */
> > > -       if (atomic_read(&mnt->__mnt_writers) > 0) {
> > > +       smp_mb();
> > > +
> > > +       /*
> > > +        * With writers on hold, if this value is zero, then there are definitely
> > > +        * no active writers (although held writers may subsequently increment
> > > +        * the count, they'll have to wait, and decrement it after seeing
> > > +        * MNT_READONLY).
> > > +        */
> > > +       if (count_mnt_writers(mnt) > 0) {
> > >                 ret = -EBUSY;
> > 
> > OK, I think this is one of the big races inherent with this approach.
> > There's nothing in here to ensure that no one is in the middle of an
> > update during this code.  The preempt_disable() will, of course, reduce
> > the window, but I think there's still a race here.
> 
> MNT_WRITE_HOLD is set, so any writer that has already made it past
> the MNT_WANT_WRITE loop will have its count visible here. Any writer
> that has not made it past that loop will wait until the slowpath
> completes and then the fastpath will go on to check whether the
> mount is still writeable.

Ahh, got it.  I'm slowly absorbing the barriers.  Not the normal way, I
code.

I thought there was another race with MNT_WRITE_HOLD since mnt_flags
isn't really managed atomically.  But, by only modifying with the
vfsmount_lock, I think it is OK.

I also wondered if there was a possibility of getting a spurious -EBUSY
when remounting r/w->r/o.  But, that turned out to just happen when the
fs was *already* r/o.  So that looks good.

While this has cleared out a huge amount of complexity, I can't stop
wondering if this could be done with a wee bit more "normal" operations.
I'm pretty sure I couldn't have come up with this by myself, and I'm a
bit worried that I wouldn't be able to find a race in it if one reared
its ugly head.  

Is there a real good reason to allocate the percpu counters dynamically?
Might as well stick them in the vfsmount and let the one
kmem_cache_zalloc() in alloc_vfsmnt() do a bit larger of an allocation.
Did you think that was going to bloat it to a compound allocation or
something?  I hate the #ifdefs. :)

-- Dave

--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html