Excerpts from Andy Lutomirski's message of June 9, 2021 2:20 am: > On 6/4/21 6:42 PM, Nicholas Piggin wrote: >> Add CONFIG_MMU_TLB_REFCOUNT which enables refcounting of the lazy tlb mm >> when it is context switched. This can be disabled by architectures that >> don't require this refcounting if they clean up lazy tlb mms when the >> last refcount is dropped. Currently this is always enabled, which is >> what existing code does, so the patch is effectively a no-op. >> >> Rename rq->prev_mm to rq->prev_lazy_mm, because that's what it is. > > I am in favor of this approach, but I would be a lot more comfortable > with the resulting code if task->active_mm were at least better > documented and possibly even guarded by ifdefs. active_mm is fairly well documented in Documentation/active_mm.rst IMO. I don't think anything has changed in 20 years, I don't know what more is needed, but if you can add to documentation that would be nice. Maybe moving a bit of that into .c and .h files? > x86 bare metal currently does not need the core lazy mm refcounting, and > x86 bare metal *also* does not need ->active_mm. Under the x86 scheme, > if lazy mm refcounting were configured out, ->active_mm could become a > dangling pointer, and this makes me extremely uncomfortable. > > So I tend to think that, depending on config, the core code should > either keep ->active_mm [1] alive or get rid of it entirely. I don't actually know what you mean. core code needs the concept of an "active_mm". This is the mm that your kernel threads are using, even in the unmerged CONFIG_LAZY_TLB=n patch, active_mm still points to init_mm for kernel threads. We could hide that idea behind an active_mm() function that would always return &init_mm if mm==NULL, but you still have the concept of an active mm and a pointer that callers must not access after free (because some cases will be CONFIG_LAZY_TLB=y). > [1] I don't really think it belongs in task_struct at all. It's not a > property of the task. It's the *per-cpu* mm that the core code is > keeping alive for lazy purposes. How about consolidating it with the > copy in rq? I agree it's conceptually a per-cpu property. I don't know why it was done this way, maybe it was just convenient and works well for mm and active_mm to be adjacent. Linus might have a better insight. > I guess the short summary of my opinion is that I like making this > configurable, but I do not like the state of the code. I don't think I'd object to moving active_mm to rq and converting all usages to active_mm() while we're there, it would make things a bit more configurable. But I don't see it making core code fundamentally less complex... if you're referring to the x86 mm switching monstrosity, then that's understandable, but I admit I haven't spent enough time looking at it to make a useful comment. A patch would be enlightening, I have the leftover CONFIG_LAZY_TLB=n patch if you were thinking of building on that I can send it to you. Thanks, Nick