On 8/11/2022 7:30 PM, Hyeonggon Yoo wrote: > On Thu, Aug 11, 2022 at 08:16:08AM +0000, Lu, Aaron wrote: >> On Thu, 2022-08-11 at 05:21 +0000, Hyeonggon Yoo wrote: >>> On Mon, Aug 08, 2022 at 10:56:46PM +0800, Aaron Lu wrote: >>>> For configs that don't have PTI enabled or cpus that don't need >>>> meltdown mitigation, current kernel can lose GLOBAL bit after a page >>>> goes through a cycle of present -> not present -> present. >>>> >>>> It happened like this(__vunmap() does this in vm_remove_mappings()): >>>> original page protection: 0x8000000000000163 (NX/G/D/A/RW/P) >>>> set_memory_np(page, 1): 0x8000000000000062 (NX/D/A/RW) lose G and P >>>> set_memory_p(pagem 1): 0x8000000000000063 (NX/D/A/RW/P) restored P >>>> >>>> In the end, this page's protection no longer has Global bit set and this >>>> would create problem for this merge small mapping feature. >>>> >>>> For this reason, restore Global bit for systems that do not have PTI >>>> enabled if page is present. >>>> >>>> (pgprot_clear_protnone_bits() deserves a better name if this patch is >>>> acceptible but first, I would like to get some feedback if this is the >>>> right way to solve this so I didn't bother with the name yet) >>>> >>>> Signed-off-by: Aaron Lu <aaron.lu@xxxxxxxxx> >>>> --- >>>> arch/x86/mm/pat/set_memory.c | 2 ++ >>>> 1 file changed, 2 insertions(+) >>>> >>>> diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c >>>> index 1abd5438f126..33657a54670a 100644 >>>> --- a/arch/x86/mm/pat/set_memory.c >>>> +++ b/arch/x86/mm/pat/set_memory.c >>>> @@ -758,6 +758,8 @@ static pgprot_t pgprot_clear_protnone_bits(pgprot_t prot) >>>> */ >>>> if (!(pgprot_val(prot) & _PAGE_PRESENT)) >>>> pgprot_val(prot) &= ~_PAGE_GLOBAL; >>>> + else >>>> + pgprot_val(prot) |= _PAGE_GLOBAL & __default_kernel_pte_mask; >>>> >>>> return prot; >>>> } >>> >>> IIUC It makes it unable to set _PAGE_GLOBL when PTI is on. >>> >> >> Yes. Is this a problem? >> I think that is the intended behaviour when PTI is on: not to enable >> Gloabl bit on kernel mappings. > > Please note that I'm not expert on PTI. > > but AFAIK with PTI, at least everything (kernel part) mapped to user page table is > mapped as global when PGE is supported. > > Not sure "Global bit is never used for kernel part when PTI is enabled" > is true. > > Also, commit d1440b23c922d ("x86/mm: Factor out pageattr _PAGE_GLOBAL setting") that introduced > pgprot_clear_protnone_bits() says: > > This unconditional setting of _PAGE_GLOBAL is a problem when we have > PTI and non-PTI and we want some areas to have _PAGE_GLOBAL and some > not. > > This updated version of the code says: > 1. Clear _PAGE_GLOBAL when !_PAGE_PRESENT > 2. Never set _PAGE_GLOBAL implicitly > 3. Allow _PAGE_GLOBAL to be in cpa.set_mask > 4. Allow _PAGE_GLOBAL to be inherited from previous PTE > Thanks for these info, I'll need to take a closer look at PTI. >>> Maybe it would be less intrusive to make >>> set_direct_map_default_noflush() replace protection bits >>> with PAGE_KENREL as it's only called for direct map, and the function >>> is to reset permission to default: >>> >>> diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c >>> index 1abd5438f126..0dd4433c1382 100644 >>> --- a/arch/x86/mm/pat/set_memory.c >>> +++ b/arch/x86/mm/pat/set_memory.c >>> @@ -2250,7 +2250,16 @@ int set_direct_map_invalid_noflush(struct page *page) >>> >>> int set_direct_map_default_noflush(struct page *page) >>> { >>> - return __set_pages_p(page, 1); >>> + unsigned long tempaddr = (unsigned long) page_address(page); >>> + struct cpa_data cpa = { >>> + .vaddr = &tempaddr, >>> + .pgd = NULL, >>> + .numpages = 1, >>> + .mask_set = PAGE_KERNEL, >>> + .mask_clr = __pgprot(~0), > > Nah, this sets _PAGE_ENC unconditionally, which should be evaluated. > Maybe less intrusive way would be: > .mask_set = __pgprot(_PAGE_PRESENT | > (_PAGE_GLOBAL & __kernel_default_pte_mask)), > .mask_clr = __pgprot(0), > >>> + .flags = 0}; >>> + >>> + return __change_page_attr_set_clr(&cpa, 0); >>> } >> >> Looks reasonable to me and it is indeed less intrusive. I'm only >> concerned there might be other paths that also go through present -> >> not present -> present and this change can not cover them. >> > > AFAIK other paths going through present->not present->present (using CPA) > is only when DEBUG_PAGEALLOC is used. > > Do we care direct map fragmentation when using DEBUG_PAGEALLOC? > No, direct mapping does not use large page mapping when DEBUG_PAGEALLOC. >>> >>> set_direct_map_{invalid,default}_noflush() is the exact reason >>> why direct map become split after vmalloc/vfree with special >>> permissions. >> >> Yes I agree, because it can lose G bit after the whole cycle when PTI >> is not on. When PTI is on, there is no such problem because G bit is >> not there initially. >> >> Thanks, >> Aaron >