Re: [RFC PATCH v3 4/4] sched+mm: Use hazard pointers to track lazy active mm existence

Joel Fernandes <joel@xxxxxxxxxxxxxxxxx> · Tue, 8 Oct 2024 15:51:23 -0400

On Tue, Oct 8, 2024 at 9:52 AM Mathieu Desnoyers
<mathieu.desnoyers@xxxxxxxxxxxx> wrote:
>
> Replace lazy active mm existence tracking with hazard pointers. This
> removes the following implementations and their associated config
> options:
>
> - MMU_LAZY_TLB_REFCOUNT
> - MMU_LAZY_TLB_SHOOTDOWN
> - This removes the call_rcu delayed mm drop for RT.
>
> It leverages the fact that each CPU only ever have at most one single
> lazy active mm. This makes it a very good fit for a hazard pointer
> domain implemented with one hazard pointer slot per CPU.
>
> -static void cleanup_lazy_tlbs(struct mm_struct *mm)
> +static void remove_lazy_mm_hp(int cpu, struct hazptr_slot *slot, void *addr)
>  {
> -       if (!IS_ENABLED(CONFIG_MMU_LAZY_TLB_SHOOTDOWN)) {
> -               /*
> -                * In this case, lazy tlb mms are refounted and would not reach
> -                * __mmdrop until all CPUs have switched away and mmdrop()ed.
> -                */
> -               return;
> -       }
> +       smp_call_function_single(cpu, do_shoot_lazy_tlb, addr, 1);
> +       smp_call_function_single(cpu, do_check_lazy_tlb, addr, 1);
> +}
>
> +static void cleanup_lazy_tlbs(struct mm_struct *mm)
> +{
[...]
> -       on_each_cpu_mask(mm_cpumask(mm), do_shoot_lazy_tlb, (void *)mm, 1);
> -       if (IS_ENABLED(CONFIG_DEBUG_VM_SHOOT_LAZIES))
> -               on_each_cpu(do_check_lazy_tlb, (void *)mm, 1);
> +       hazptr_scan(&hazptr_domain_sched_lazy_mm, mm, remove_lazy_mm_hp);

Hey Mathieu, Take comments with a grain of salt because I am digging
into active_mm after a while.
It seems to me IMO this seems a strange hazard pointer callback
usecase. Because "scan" here immediately retires even though the
reader has a "reference". Here it is more like, the callback is
forcing all other readers holding a "reference" to switch immediately
whether they like it or not and not wait until _they_ release the
reference. There is no such precedent in RCU for instance, where a
callback never runs before a reader even finishes.

That might work for active_mm, but it sounds like a fringe usecase to
me that it might probably be better to just force
CONFIG_MMU_LAZY_TLB_SHOOTDOWN=y for everyone and use on_each_cpu()
instead. That will give the same code simplification for this patch
without requiring hazard pointers AFAICS? Or maybe start with that,
and _then_ convert to HP if it makes sense to? These are just some
thoughts and I am Ok with all the reviewer's consensus.

And if I understand correctly, for this usecase - we are not even
grabbing a "true reference" to the mm_struct object because direct
access to mm_struct should require a proper mmgrab(), not a lazy_tlb
flavored one? -- correct me if I'm wrong though.

Also, isn't it that on x86, now with this patch there will be more
IPIs, whereas previously the global refcount was not requiring that as
the last kthread switching out would no longer access the old mm?
Might it be worth checking the performance of fork/exit and if that
scales?

thanks,

- Joel