On 05/07/2024 18:59, Catalin Marinas wrote: > On Fri, May 03, 2024 at 02:01:35PM +0100, Joey Gouly wrote: >> @@ -163,7 +182,8 @@ static inline pteval_t __phys_to_pte_val(phys_addr_t phys) >> #define pte_access_permitted_no_overlay(pte, write) \ >> (((pte_val(pte) & (PTE_VALID | PTE_USER)) == (PTE_VALID | PTE_USER)) && (!(write) || pte_write(pte))) >> #define pte_access_permitted(pte, write) \ >> - pte_access_permitted_no_overlay(pte, write) >> + (pte_access_permitted_no_overlay(pte, write) && \ >> + por_el0_allows_pkey(FIELD_GET(PTE_PO_IDX_MASK, pte_val(pte)), write, false)) > I'm still not entirely convinced on checking the keys during fast GUP > but that's what x86 and powerpc do already, so I guess we'll follow the > same ABI. I've thought about this some more. In summary I don't think adding this check to pte_access_permitted() is controversial, but we should decide how POR_EL0 is set for kernel threads. This change essentially means that fast GUP behaves like uaccess for pages that are already present: in both cases POR_EL0 will be looked up based on the POIndex of the page being accessed (by the hardware in the uaccess case, and explicitly in the fast GUP case). Fast GUP always operates on current->mm, so to me checking POR_EL0 in pte_access_permitted() should be no more restrictive than a uaccess check from a user perspective. In other words, POR_EL0 is checked when the kernel accesses user memory on the user's behalf, whether through uaccess or GUP. It's also worth noting that the "slow" GUP path (which get_user_pages_fast() falls back to if a page is missing) also checks POR_EL0 by virtue of calling handle_mm_fault(), which in turn calls arch_vma_access_permitted(). It would be pretty inconsistent for the slow GUP path to do a pkey check but not the fast path. (That said, the slow GUP path does not call arch_vma_access_permitted() if a page is already present, so callers of get_user_pages() and similar will get inconsistent checking. Not great, that may be worth fixing - but that's clearly beyond the scope of this series.) Now an interesting question is what happens with kernel threads that access user memory, as is the case for the optional io_uring kernel thread (IORING_SETUP_SQPOLL). The discussion above holds regardless of the type of thread, so the sqpoll thread will have its POR_EL0 checked when processing commands that involve uaccess or GUP. AFAICT, this series does not have special handling for kernel threads w.r.t. POR_EL0, which means that it is left unchanged when a new kernel thread is cloned (create_io_thread() in the IORING_SETUP_SQPOLL case). The sqpoll thread will therefore inherit POR_EL0 from the (user) thread that calls io_uring_setup(). In other words, the sqpoll thread ends up with the same view of user memory as that user thread - for instance if its POR_EL0 prevents access to POIndex 1, then any I/O that the sqpoll thread attempts on mappings with POIndex/pkey 1 will fail. This behaviour seems potentially useful to me, as the io_uring SQ could easily become a way to bypass POE without some restriction. However, it feels like this should be documented, as one should keep it in mind when using pkeys, and there may well be other cases where kernel threads are impacted by POR_EL0. I am also unsure how x86/ppc handle this. Kevin