On Tue, Apr 23, 2019 at 07:15:44PM -0700, Davidlohr Bueso wrote: > On Wed, 03 Apr 2019, Daniel Jordan wrote: > > > On Wed, Apr 03, 2019 at 06:58:45AM +0200, Christophe Leroy wrote: > > > Le 02/04/2019 à 22:41, Daniel Jordan a écrit : > > > > With locked_vm now an atomic, there is no need to take mmap_sem as > > > > writer. Delete and refactor accordingly. > > > > > > Could you please detail the change ? > > > > Ok, I'll be more specific in the next version, using some of your language in > > fact. :) > > > > > It looks like this is not the only > > > change. I'm wondering what the consequences are. > > > > > > Before we did: > > > - lock > > > - calculate future value > > > - check the future value is acceptable > > > - update value if future value acceptable > > > - return error if future value non acceptable > > > - unlock > > > > > > Now we do: > > > - atomic update with future (possibly too high) value > > > - check the new value is acceptable > > > - atomic update back with older value if new value not acceptable and return > > > error > > > > > > So if a concurrent action wants to increase locked_vm with an acceptable > > > step while another one has temporarily set it too high, it will now fail. > > > > > > I think we should keep the previous approach and do a cmpxchg after > > > validating the new value. > > Wouldn't the cmpxchg alternative also be exposed the locked_vm changing between > validating the new value and the cmpxchg() and we'd bogusly fail even when there > is still just because the value changed (I'm assuming we don't hold any locks, > otherwise all this is pointless). > > current_locked = atomic_read(&mm->locked_vm); > new_locked = current_locked + npages; > if (new_locked < lock_limit) > if (cmpxchg(&mm->locked_vm, current_locked, new_locked) == current_locked) > /* ENOMEM */ Well it needs a loop.. again: current_locked = atomic_read(&mm->locked_vm); new_locked = current_locked + npages; if (new_locked < lock_limit) if (cmpxchg(&mm->locked_vm, current_locked, new_locked) != current_locked) goto again; So it won't have bogus failures as there is no unwind after error. Basically this is a load locked/store conditional style of locking pattern. > > That's a good idea, and especially worth doing considering that an arbitrary > > number of threads that charge a low amount of locked_vm can fail just because > > one thread charges lots of it. > > Yeah but the window for this is quite small, I doubt it would be a real issue. > > What if before doing the atomic_add_return(), we first did the racy new_locked > check for ENOMEM, then do the speculative add and cleanup, if necessary. This > would further reduce the scope of the window where false ENOMEM can occur. > > > pinned_vm appears to be broken the same way, so I can fix it too unless someone > > beats me to it. > > This should not be a surprise for the rdma folks. Cc'ing Jason nonetheless. I think we accepted this tiny race as a side effect of removing the lock, which was very beneficial. Really the time window between the atomic failing and unwind is very small, and there are enough other ways a hostile user could DOS locked_vm that I don't think it really matters in practice.. However, the cmpxchg seems better, so a helper to implement that would probably be the best thing to do. Jason