Re: [PATCH v4 2/4] lazy tlb: allow lazy tlb mm refcounting to be configurable

Nicholas Piggin <npiggin@xxxxxxxxx> · Mon, 14 Jun 2021 14:14:27 +1000

Excerpts from Andy Lutomirski's message of June 14, 2021 1:52 pm:
> On 6/13/21 5:45 PM, Nicholas Piggin wrote:
>> Excerpts from Andy Lutomirski's message of June 9, 2021 2:20 am:
>>> On 6/4/21 6:42 PM, Nicholas Piggin wrote:
>>>> Add CONFIG_MMU_TLB_REFCOUNT which enables refcounting of the lazy tlb mm
>>>> when it is context switched. This can be disabled by architectures that
>>>> don't require this refcounting if they clean up lazy tlb mms when the
>>>> last refcount is dropped. Currently this is always enabled, which is
>>>> what existing code does, so the patch is effectively a no-op.
>>>>
>>>> Rename rq->prev_mm to rq->prev_lazy_mm, because that's what it is.
>>>
>>> I am in favor of this approach, but I would be a lot more comfortable
>>> with the resulting code if task->active_mm were at least better
>>> documented and possibly even guarded by ifdefs.
>> 
>> active_mm is fairly well documented in Documentation/active_mm.rst IMO.
>> I don't think anything has changed in 20 years, I don't know what more
>> is needed, but if you can add to documentation that would be nice. Maybe
>> moving a bit of that into .c and .h files?
>> 
> 
> Quoting from that file:
> 
>   - however, we obviously need to keep track of which address space we
>     "stole" for such an anonymous user. For that, we have "tsk->active_mm",
>     which shows what the currently active address space is.
> 
> This isn't even true right now on x86.

>From the perspective of core code, it is. x86 might do something crazy 
with it, but it has to make it appear this way to non-arch code that
uses active_mm.

Is x86's scheme documented?

> With your patch applied:
> 
>  To support all that, the "struct mm_struct" now has two counters: a
>  "mm_users" counter that is how many "real address space users" there are,
>  and a "mm_count" counter that is the number of "lazy" users (ie anonymous
>  users) plus one if there are any real users.
> 
> isn't even true any more.

Well yeah but the active_mm concept hasn't changed. The refcounting 
change is hopefully reasonably documented?

> 
> 
>>> x86 bare metal currently does not need the core lazy mm refcounting, and
>>> x86 bare metal *also* does not need ->active_mm.  Under the x86 scheme,
>>> if lazy mm refcounting were configured out, ->active_mm could become a
>>> dangling pointer, and this makes me extremely uncomfortable.
>>>
>>> So I tend to think that, depending on config, the core code should
>>> either keep ->active_mm [1] alive or get rid of it entirely.
>> 
>> I don't actually know what you mean.
>> 
>> core code needs the concept of an "active_mm". This is the mm that your 
>> kernel threads are using, even in the unmerged CONFIG_LAZY_TLB=n patch,
>> active_mm still points to init_mm for kernel threads.
> 
> Core code does *not* need this concept.  First, it's wrong on x86 since
> at least 4.15.  Any core code that actually assumes that ->active_mm is
> "active" for any sensible definition of the word active is wrong.
> Fortunately there is no such code.
> 
> I looked through all active_mm references in core code.  We have:
> 
> kernel/sched/core.c: it's all refcounting, although it's a bit tangled
> with membarrier.
> 
> kernel/kthread.c: same.  refcounting and membarrier stuff.
> 
> kernel/exit.c: exit_mm() a BUG_ON().
> 
> kernel/fork.c: initialization code and a warning.
> 
> kernel/cpu.c: cpu offline stuff.  wouldn't be needed if active_mm went away.
> 
> fs/exec.c: nothing of interest

I might not have been clear. Core code doesn't need active_mm if 
active_mm somehow goes away. I'm saying active_mm can't go away because
it's needed to support (most) archs that do lazy tlb mm switching.

The part I don't understand is when you say it can just go away. How? 

> I didn't go through drivers, but I maintain my point.  active_mm is
> there for refcounting.  So please don't just make it even more confusing
> -- do your performance improvement, but improve the code at the same
> time: get rid of active_mm, at least on architectures that opt out of
> the refcounting.

powerpc opts out of the refcounting and can not "get rid of active_mm".
Not even in theory.

Thanks,
Nick