On Thu, May 18, 2023 at 2:43 PM Dave Hansen <dave.hansen@xxxxxxxxx> wrote: > > On 5/15/23 06:05, jeffxu@xxxxxxxxxxxx wrote: > > +static inline int __arch_check_vma_pkey_for_write(struct vm_area_struct *vma) > > +{ > > + int pkey = vma_pkey(vma); > > + > > + if (mm_pkey_enforce_api(vma->vm_mm, pkey)) { > > + if (!__pkru_allows_write(read_pkru(), pkey)) > > + return -EACCES; > > + } > > + > > + return 0; > > +} > > Please think very carefully about what I'm about to say: > > What connects vma->vm_mm to read_pkru() here? > > Now think about what happens when we have kthread_use_mm() or a ptrace() > doing get_task_mm() and working on another process's mm or VMA. > > Look at arch_vma_access_permitted() and notice how it avoids read_pkru() > for 'foreign' aka. 'remote' accesses: > > > static inline bool arch_vma_access_permitted(struct vm_area_struct *vma, > > bool write, bool execute, bool foreign) > > { > ... > > if (foreign || vma_is_foreign(vma)) > > return true; > > return // check read_pkru() > > } > > In other words, it lets all remote accesses right through. That's > because there is *NOTHING* that fundamentally and tightly connects the > PKRU value in this context to the VMA or the context that initiated this > operation. > > If your security model depends on PKRU protection, this 'remote' > disconnection is problematic. The PKRU enforcement inside the kernel is > best-effort. That usually doesn't map into the security space very well. > > Do you have a solid handle on all call paths that will reach > __arch_check_vma_pkey_for_write() and can you ensure they are all > non-remote? Is this about the attack scenario where the attacker uses ptrace() into the chrome process ? if so it is not in our threat model, and that is more related to sandboxing on the host. Or is this about io_uring? Yes, io_uring kernel thread breaks our expectations of PKRU & user space threads, however I thought the break is not just for this - any syscall involved in memory operation will break after into io_uring ? Other than those, yes, I try to ensure the check is only used at the beginning of syscall entry in all cases, which should be non-remote I hope. Thanks -Jeff