On Wed, 2022-06-15 at 12:47 +0900, Hyeonggon Yoo wrote: > On Tue, Jun 14, 2022 at 06:23:43PM +0000, Edgecombe, Rick P wrote: > > On Tue, 2022-06-14 at 15:53 +0900, Hyeonggon Yoo wrote: > > > On Tue, Jun 14, 2022 at 03:39:33PM +0900, Hyeonggon Yoo wrote: > > > > commit a8aed3e0752b4 ("x86/mm/pageattr: Prevent PSE and GLOABL > > > > leftovers > > > > to confuse pmd/pte_present and pmd_huge") made CPA clear > > > > _PAGE_GLOBAL when > > > > _PAGE_PRESENT is not set. This prevents kernel crashing when > > > > kernel > > > > reads > > > > a page with !_PAGE_PRESENT and _PAGE_PROTNONE (_PAGE_GLOBAL). > > > > And > > > > then it > > > > set _PAGE_GLOBAL back when setting _PAGE_PRESENT again. > > > > > > > > After commit d1440b23c922d ("x86/mm: Factor out pageattr > > > > _PAGE_GLOBAL > > > > setting") made kernel not set unconditionally _PAGE_GLOBAL, > > > > pages > > > > lose > > > > global flag after _set_pages_np() and _set_pages_p() are > > > > called. > > > > > > > > But after commit 3166851142411 ("x86: skip check for spurious > > > > faults for > > > > non-present faults"), spurious_kernel_fault() does not confuse > > > > pte/pmd entries with _PAGE_PROTNONE as present anymore. So > > > > simply > > > > drop pgprot_clear_protnone_bits(). > > > > > > > > > Looks like I forgot to Cc: Andrea Arcangeli <aarcange@xxxxxxxxxx> > > > > > > Plus I did check that kernel does not crash when reading > > > from/writing > > > to > > > non-present pages with this patch applied. > > > > Thanks for the history. > > > > I think we should still fix pte_present() to not check prot_none if > > the > > user bit is clear. > > I tried, but realized it wouldn't work :( > > For example, when a pte entry is used as swap entry, _PAGE_PRESENT is > cleared and _PAGE_PROTNONE is set. > > And other bits are used as type and offset of swap entry. > In that case, _PAGE_BIT_USER bit does not represent _PAGE_USER. > It is just one of bits that represents type of swap entry. > > So checking if _PAGE_PROTNONE set only when _PAGE_USER is set > will confuse some swap entries as non-present. Oooh, right. So the user bit records "when a pagetable is writeprotected by userfaultfd WP support". I'm not sure if maybe PCD is available to move that to and leave the user bit in place, but it sounds like an errata sensitive area to be tweaking. > > > The spurious fault handler infinite loop may no > > longer be a problem, but pte_present() still would return true for > > kernel NP pages, so be fragile. Today I see at least the oops > > message > > and memory hotunplug (see remove_pagetable()) that would get > > confused. > > As explained above, I don't think it's possible to make > pte_present() > accurate for both kernel and user ptes. > > Maybe we can implement pte_present_kernel()/pte_present_user() > for when kernel knows it is user or kernel pte. This seems like a decent option to me. It seems there are only a few places that are isolated to arch/x86. > > or pte_present_with_address(pte, address) if we don't > know it is user pte or kernel pte. >