Re: [PATCH v3 4/6] powerpc: Introduce temporary mm

Jann Horn <jannh@xxxxxxxxxx> · Thu, 27 Aug 2020 16:15:41 +0200

On Thu, Aug 27, 2020 at 7:24 AM Christopher M. Riedl <cmr@xxxxxxxxxxx> wrote:
> x86 supports the notion of a temporary mm which restricts access to
> temporary PTEs to a single CPU. A temporary mm is useful for situations
> where a CPU needs to perform sensitive operations (such as patching a
> STRICT_KERNEL_RWX kernel) requiring temporary mappings without exposing
> said mappings to other CPUs. A side benefit is that other CPU TLBs do
> not need to be flushed when the temporary mm is torn down.
>
> Mappings in the temporary mm can be set in the userspace portion of the
> address-space.
[...]
> diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c
[...]
> @@ -44,6 +45,70 @@ int raw_patch_instruction(struct ppc_inst *addr, struct ppc_inst instr)
>  }
>
>  #ifdef CONFIG_STRICT_KERNEL_RWX
> +
> +struct temp_mm {
> +       struct mm_struct *temp;
> +       struct mm_struct *prev;
> +       bool is_kernel_thread;
> +       struct arch_hw_breakpoint brk[HBP_NUM_MAX];
> +};
> +
> +static inline void init_temp_mm(struct temp_mm *temp_mm, struct mm_struct *mm)
> +{
> +       temp_mm->temp = mm;
> +       temp_mm->prev = NULL;
> +       temp_mm->is_kernel_thread = false;
> +       memset(&temp_mm->brk, 0, sizeof(temp_mm->brk));
> +}
> +
> +static inline void use_temporary_mm(struct temp_mm *temp_mm)
> +{
> +       lockdep_assert_irqs_disabled();
> +
> +       temp_mm->is_kernel_thread = current->mm == NULL;

(That's a somewhat misleading variable name - kernel threads can have
a non-NULL ->mm, too.)

> +       if (temp_mm->is_kernel_thread)
> +               temp_mm->prev = current->active_mm;
> +       else
> +               temp_mm->prev = current->mm;

Why the branch? Shouldn't current->active_mm work in both cases?

> +       /*
> +        * Hash requires a non-NULL current->mm to allocate a userspace address
> +        * when handling a page fault. Does not appear to hurt in Radix either.
> +        */
> +       current->mm = temp_mm->temp;

This looks dangerous to me. There are various places that attempt to
find all userspace tasks that use a given mm by iterating through all
tasks on the system and comparing each task's ->mm pointer to
current's. Things like current_is_single_threaded() as part of various
security checks, mm_update_next_owner(), zap_threads(), and so on. So
if this is reachable from userspace task context (which I think it
is?), I don't think we're allowed to switch out the ->mm pointer here.

> +       switch_mm_irqs_off(NULL, temp_mm->temp, current);

switch_mm_irqs_off() calls switch_mmu_context(), which in the nohash
implementation increments next->context.active and decrements
prev->context.active if prev is non-NULL, right? So this would
increase temp_mm->temp->context.active...

> +       if (ppc_breakpoint_available()) {
> +               struct arch_hw_breakpoint null_brk = {0};
> +               int i = 0;
> +
> +               for (; i < nr_wp_slots(); ++i) {
> +                       __get_breakpoint(i, &temp_mm->brk[i]);
> +                       if (temp_mm->brk[i].type != 0)
> +                               __set_breakpoint(i, &null_brk);
> +               }
> +       }
> +}
> +
> +static inline void unuse_temporary_mm(struct temp_mm *temp_mm)
> +{
> +       lockdep_assert_irqs_disabled();
> +
> +       if (temp_mm->is_kernel_thread)
> +               current->mm = NULL;
> +       else
> +               current->mm = temp_mm->prev;
> +       switch_mm_irqs_off(NULL, temp_mm->prev, current);

... whereas this would increase temp_mm->prev->context.active. As far
as I can tell, that'll mean that both the original mm and the patching
mm will have their .active counts permanently too high after
use_temporary_mm()+unuse_temporary_mm()?

> +       if (ppc_breakpoint_available()) {
> +               int i = 0;
> +
> +               for (; i < nr_wp_slots(); ++i)
> +                       if (temp_mm->brk[i].type != 0)
> +                               __set_breakpoint(i, &temp_mm->brk[i]);
> +       }
> +}