Re: [RFC PATCH v2 20/20] x86/mm, mm/vmalloc: Defer flush_tlb_kernel_range() targeting NOHZ_FULL CPUs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 7/25/23 09:37, Marcelo Tosatti wrote:
>> TLB flushes for freed page tables are another game entirely.  The CPU is
>> free to cache any part of the paging hierarchy it wants at any time.
> Depend on CONFIG_PAGE_TABLE_ISOLATION=y, which flushes TLB (and page
> table caches) on user->kernel and kernel->user context switches ?

Well, first of all, CONFIG_PAGE_TABLE_ISOLATION doesn't flush the TLB at
all on user<->kernel switches when PCIDs are enabled.

Second, even if it did, the CPU is still free to cache any portion of
the paging hierarchy at any time.  Without LASS[1], userspace can even
_compel_ walks of the kernel portion of the address space, and we don't
have any infrastructure to tell if a freed kernel page is exposed in the
user copy of the page tables with PTI.

Third, (also ignoring PCIDs) there are plenty of instructions between
kernel entry and the MOV-to-CR3 that can flush the TLB.  All those
instructions architecturally permitted to speculatively set Accessed or
Dirty bits in any part of the address space.  If they run into a free
page table page, things get ugly.

These accesses are not _likely_.  There probably isn't a predictor out
there that's going to see a:

	movq    %rsp, PER_CPU_VAR(cpu_tss_rw + TSS_sp2)

and go off trying to dirty memory in the vmalloc() area.  But we'd need
some backward *and* forward-looking guarantees from our intrepid CPU
designers to promise that this kind of thing is safe yesterday, today
and tomorrow.  I suspect such a guarantee is going to be hard to obtain.

1. https://lkml.kernel.org/r/20230110055204.3227669-1-yian.chen@xxxxxxxxx






[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux