On Wed, Feb 19, 2020 at 12:58:55PM +0100, Michal Hocko wrote: > On Tue 18-02-20 13:56:18, Andrea Arcangeli wrote: > > Hi Michal! > > > > On Tue, Feb 18, 2020 at 12:31:03PM +0100, Michal Hocko wrote: > > > On Mon 03-02-20 15:17:44, Andrea Arcangeli wrote: > > > > alpha, ia64, mips, powerpc, sh, sparc are relying on a check on > > > > mm->mm_users to know if they can skip some remote TLB flushes for > > > > single threaded processes. > > > > > > > > Most callers of use_mm() tend to invoke mmget_not_zero() or > > > > get_task_mm() before use_mm() to ensure the mm will remain alive in > > > > between use_mm() and unuse_mm(). > > > > > > > > Some callers however don't increase mm_users and they instead rely on > > > > serialization in __mmput() to ensure the mm will remain alive in > > > > between use_mm() and unuse_mm(). Not increasing mm_users during > > > > use_mm() is however unsafe for aforementioned arch TLB flushes > > > > optimizations. So either mmget()/mmput() should be added to the > > > > problematic callers of use_mm()/unuse_mm() or we can embed them in > > > > use_mm()/unuse_mm() which is more robust. > > > > > > I would prefer we do not do that because then the real owner of the mm > > > cannot really tear down the address space and the life time of it is > > > bound to a kernel thread doing the use_mm. This is undesirable I would > > > really prefer if the existing few users would use mmget only when they > > > really need to access mm. > > > > If the existing few users that don't already do the explicit mmget > > will have to start doing it too, the end result will be exactly the > > same that you described in your "cons" (lieftime of the mm will still > > be up to who did mmget;use_mm and didn't call unuse_mm;mmput yet). > > Well, they should use mmget only for moments when they access the > address space. I don't think so because mmget is a pure atomic_inc. How can you serialize that against a tlb flush that just does an atomic_read() on mm_users? At the very least you will have to invent something new slower than mmget that adds some serialization against the TLB flush code and the common code would need to learn to use that. And the question is how much faster that can be than switch_mm() that use_mm already invokes. It doesn't need to block but it needs a smp_mb() on both sides. i.e. after atomic_inc(&mm_users) you need a smp_mb() and a test_and_clear_bit and conditional local tlb flush. In the TLB flush code you need to set the bit, smp_mb() and then atomic_read(&mm_users). As things are now (no new mmget_local_tlb_flush() call in common code) the mmget has to happen regardless before use_mm() so there's a slight chance to serialize it in switch_mm arch code. Note I did put a smp_mb() myself in the switch_mm path to make it safe in 2/2. With your solution it'd be already possible to call mmput at any time after using the mm and before calling unuse_mm. But then you'd have to still call unuse_mm; mmget;use_mm again before you can use the mm again so why not to call unuse_mm immediately instead of only mmput? > > One reason to prefer adding the mmget to the callers to forget it, > > would be to avoid an atomic op in use_mm (for those callers that > > didn't forget it), but if anybody is doing use_mm in a fast path that > > won't be very fast anyway so I didn't think this was worth the > > risk. If that microoptimization in a slow path is the reason we should > > add mmget to the callers that forgot it that would be fine with me > > although I think it's risky because if already happened once and it > > could happen again (and when it happens it only bits a few arches if > > used with a few drivers). > > The primary concern really is that use_mm is usually not bound in time > and we do not want to pin the address space for such a long time without > any binding to a killable process. Almost all use_mm already do mmget before use_mm and mmput after unuse_mm, or perhaps I wouldn't have to find about this by source review only. So the cons you describe already affects the vast majority of use_mm cases (the non buggy ones). I see you concern for adding a "cons" to the minority of remaining cases, but it's still better to introduce that "cons" than corrupting userland memory. Thanks, Andrea