Re: [PATCH] mm: use correct VMA flags when freeing page-tables

Nadav Amit <nadav.amit@xxxxxxxxx> · Fri, 22 Oct 2021 08:37:08 -0700

> On Oct 21, 2021, at 8:06 PM, Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> wrote:
> 
> On Thu, 21 Oct 2021 05:23:22 -0700 Nadav Amit <nadav.amit@xxxxxxxxx> wrote:
> 
>> From: Nadav Amit <namit@xxxxxxxxxx>
>> 
>> Consistent use of the mmu_gather interface requires a call to
>> tlb_start_vma() and tlb_end_vma() for each VMA. free_pgtables() does not
>> follow this pattern.
>> 
>> Certain architectures need tlb_start_vma() to be called in order for
>> tlb_update_vma_flags() to update the VMA flags (tlb->vma_exec and
>> tlb->vma_huge), which are later used for the proper TLB flush to be
>> issued. Since tlb_start_vma() is not called, this can lead to the wrong
>> VMA flags being used when the flush is performed.
>> 
>> Specifically, the munmap syscall would call unmap_region(), which unmaps
>> the VMAs and then frees the page-tables. A flush is needed after
>> the page-tables are removed to prevent page-walk caches from holding
>> stale entries, but this flush would use the flags of the VMA flags of
>> the last VMA that was flushed. This does not appear to be right.
> 
> Any thoughts on what the worst-case end-user cisible effects of this
> would be?
> 
> Again, I'm wondering about the desirability of a -stable backport.

This issue is not relevant for x86, which I am most familiar with, hence my
limited understanding of the impact on specific architectures.

In general, a TLB flush is needed after the page-tables are released
(in contrast to PTE removal) to prevent a speculative page-walk that might
access IO pages or install bogus entries. Such speculative page-walks have
been reported (on x86) as causing a machine-check. [1]

If a certain architecture has different page-walk caches for executable and
non-executable pages (i.e., different page-walk caches for iTLB and dTLB) or
for different page sizes, it might not perform the necessary TLB flush on the
proper TLB.

Looking at the code, we can see MIPS’s flow of:

tlb_flush()
->flush_tlb_range()
->local_flush_tlb_page()
->flush_micro_tlb_vm()

which calls flush_micro_tlb() only if (vma->vm_flags & VM_EXEC). So
MIPS cares about the VM_EXEC. Yet, it is not certain MIPS might be
affected.

For an architecture to be affected it needs to have all the following
properties:

1. Be able to experience hardware failure/exceptions on speculative
   page-walks.

2. Have separate page-walk caches for exec/data or huge/small pages.

3. Flush the TLBs based on VM_EXEC and VM_HUGETLB.

>From the code, MIPS has the 3rd property and presumably the 2nd, but it is
not certain it has the 1st.

I did not mark it as stable, since I am not sure such an architecture
exists.

[1] https://lore.kernel.org/lkml/tip-b956575bed91ecfb136a8300742ecbbf451471ab@xxxxxxxxxxxxxx/