Re: [RFC PATCH 2/6] x86/mm: temporary mm struct

Andy Lutomirski <luto@xxxxxxxxxxxxxx> · Wed, 29 Aug 2018 18:59:52 -0700

> On Aug 29, 2018, at 6:38 PM, Masami Hiramatsu <mhiramat@xxxxxxxxxx> wrote:
> 
> On Wed, 29 Aug 2018 08:41:00 -0700
> Andy Lutomirski <luto@xxxxxxxxxx> wrote:
> 
>>> On Wed, Aug 29, 2018 at 2:49 AM, Masami Hiramatsu <mhiramat@xxxxxxxxxx> wrote:
>>> On Wed, 29 Aug 2018 01:11:43 -0700
>>> Nadav Amit <namit@xxxxxxxxxx> wrote:
>>> 
>>>> From: Andy Lutomirski <luto@xxxxxxxxxx>
>>>> 
>>>> Sometimes we want to set a temporary page-table entries (PTEs) in one of
>>>> the cores, without allowing other cores to use - even speculatively -
>>>> these mappings. There are two benefits for doing so:
>>>> 
>>>> (1) Security: if sensitive PTEs are set, temporary mm prevents their use
>>>> in other cores. This hardens the security as it prevents exploding a
>>>> dangling pointer to overwrite sensitive data using the sensitive PTE.
>>>> 
>>>> (2) Avoiding TLB shootdowns: the PTEs do not need to be flushed in
>>>> remote page-tables.
>>>> 
>>>> To do so a temporary mm_struct can be used. Mappings which are private
>>>> for this mm can be set in the userspace part of the address-space.
>>>> During the whole time in which the temporary mm is loaded, interrupts
>>>> must be disabled.
>>>> 
>>>> The first use-case for temporary PTEs, which will follow, is for poking
>>>> the kernel text.
>>>> 
>>>> [ Commit message was written by Nadav ]
>>>> 
>>>> Cc: Andy Lutomirski <luto@xxxxxxxxxx>
>>>> Cc: Masami Hiramatsu <mhiramat@xxxxxxxxxx>
>>>> Cc: Kees Cook <keescook@xxxxxxxxxxxx>
>>>> Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
>>>> Signed-off-by: Nadav Amit <namit@xxxxxxxxxx>
>>>> ---
>>>> arch/x86/include/asm/mmu_context.h | 20 ++++++++++++++++++++
>>>> 1 file changed, 20 insertions(+)
>>>> 
>>>> diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h
>>>> index eeeb9289c764..96afc8c0cf15 100644
>>>> --- a/arch/x86/include/asm/mmu_context.h
>>>> +++ b/arch/x86/include/asm/mmu_context.h
>>>> @@ -338,4 +338,24 @@ static inline unsigned long __get_current_cr3_fast(void)
>>>>      return cr3;
>>>> }
>>>> 
>>>> +typedef struct {
>>>> +     struct mm_struct *prev;
>>>> +} temporary_mm_state_t;
>>>> +
>>>> +static inline temporary_mm_state_t use_temporary_mm(struct mm_struct *mm)
>>>> +{
>>>> +     temporary_mm_state_t state;
>>>> +
>>>> +     lockdep_assert_irqs_disabled();
>>>> +     state.prev = this_cpu_read(cpu_tlbstate.loaded_mm);
>>>> +     switch_mm_irqs_off(NULL, mm, current);
>>>> +     return state;
>>>> +}
>>> 
>>> Hmm, why don't we return mm_struct *prev directly?
>> 
>> I did it this way to make it easier to add future debugging stuff
>> later. Also, when I first wrote this, I stashed the old CR3 instead
>> of the old mm_struct, and it seemed like callers should be insulated
>> from details like this.
> 
> Hmm, I see. But in that case, we should call it "struct temporary_mm"
> and explicitly allocate (and pass) it, since we can not return the
> data structure from stack.

Why not?

> If we can combine it with new mm, it will
> be more encapsulated e.g.
> 
> struct temporary_mm {
>    struct mm_struct *mm;
>    struct mm_struct *prev;
> };
> 
> static struct temporary_mm poking_tmp_mm;
> 
> poking_init()
> {
>    if (init_temporary_mm(&tmp_mm, &init_mm))
>        goto error;
>    ...
> }
> 
> text_poke_safe()
> {
>    ...
>    use_temporary_mm(&tmp_mm);
>    ...
>    unuse_temporary_mm(&tmp_mm);
> }
> 
> Any thought?

That seems more complicated for not very much gain.