Re: [PATCH 5/6] powerpc/mmu: drop mmap_sem now that locked_vm is atomic

Jason Gunthorpe <jgg@xxxxxxxxxxxx> · Wed, 24 Apr 2019 11:10:24 +0000

On Tue, Apr 23, 2019 at 07:15:44PM -0700, Davidlohr Bueso wrote:
> On Wed, 03 Apr 2019, Daniel Jordan wrote:
> 
> > On Wed, Apr 03, 2019 at 06:58:45AM +0200, Christophe Leroy wrote:
> > > Le 02/04/2019 à 22:41, Daniel Jordan a écrit :
> > > > With locked_vm now an atomic, there is no need to take mmap_sem as
> > > > writer.  Delete and refactor accordingly.
> > > 
> > > Could you please detail the change ?
> > 
> > Ok, I'll be more specific in the next version, using some of your language in
> > fact.  :)
> > 
> > > It looks like this is not the only
> > > change. I'm wondering what the consequences are.
> > > 
> > > Before we did:
> > > - lock
> > > - calculate future value
> > > - check the future value is acceptable
> > > - update value if future value acceptable
> > > - return error if future value non acceptable
> > > - unlock
> > > 
> > > Now we do:
> > > - atomic update with future (possibly too high) value
> > > - check the new value is acceptable
> > > - atomic update back with older value if new value not acceptable and return
> > > error
> > > 
> > > So if a concurrent action wants to increase locked_vm with an acceptable
> > > step while another one has temporarily set it too high, it will now fail.
> > > 
> > > I think we should keep the previous approach and do a cmpxchg after
> > > validating the new value.
> 
> Wouldn't the cmpxchg alternative also be exposed the locked_vm changing between
> validating the new value and the cmpxchg() and we'd bogusly fail even when there
> is still just because the value changed (I'm assuming we don't hold any locks,
> otherwise all this is pointless).
> 
>   current_locked = atomic_read(&mm->locked_vm);
>   new_locked = current_locked + npages;
>   if (new_locked < lock_limit)
>      if (cmpxchg(&mm->locked_vm, current_locked, new_locked) == current_locked)
>      	 /* ENOMEM */

Well it needs a loop..

again:
   current_locked = atomic_read(&mm->locked_vm);
   new_locked = current_locked + npages;
   if (new_locked < lock_limit)
      if (cmpxchg(&mm->locked_vm, current_locked, new_locked) != current_locked)
            goto again;

So it won't have bogus failures as there is no unwind after
error. Basically this is a load locked/store conditional style of
locking pattern.

> > That's a good idea, and especially worth doing considering that an arbitrary
> > number of threads that charge a low amount of locked_vm can fail just because
> > one thread charges lots of it.
> 
> Yeah but the window for this is quite small, I doubt it would be a real issue.
> 
> What if before doing the atomic_add_return(), we first did the racy new_locked
> check for ENOMEM, then do the speculative add and cleanup, if necessary. This
> would further reduce the scope of the window where false ENOMEM can occur.
> 
> > pinned_vm appears to be broken the same way, so I can fix it too unless someone
> > beats me to it.
> 
> This should not be a surprise for the rdma folks. Cc'ing Jason nonetheless.

I think we accepted this tiny race as a side effect of removing the
lock, which was very beneficial. Really the time window between the
atomic failing and unwind is very small, and there are enough other
ways a hostile user could DOS locked_vm that I don't think it really
matters in practice..

However, the cmpxchg seems better, so a helper to implement that would
probably be the best thing to do.

Jason