On Fri, Apr 22, 2022 at 01:04:31PM +0200, David Hildenbrand wrote: > On 22.04.22 12:28, Catalin Marinas wrote: > > On Thu, Apr 21, 2022 at 06:37:49PM +0100, David Hildenbrand wrote: > >> Note that in the (FOLL_WRITE|FOLL_FORCE) we only require VM_MAYWRITE on > >> the vma and trigger a write fault. As the VMA is not VM_WRITE, we won't > >> actually map the PTE writable, but set it dirty. GUP will retry, find a > >> R/O pte that is dirty and where it knows that it broke COW and will > >> allow the read access, although the PTE is R/O. > >> > >> That mechanism is required to e.g., set breakpoints in R/O MAP_PRIVATE > >> kernel sections, but it's used elsewhere for page pinning as well. > >> > >> My gut feeling is that GUP(FOLL_WRITE|FOLL_FORCE) could be used right > >> now to bypass that mechanism, I might be wrong. > > > > GUP can be used to bypass this. But if an attacker can trigger such GUP > > paths via a syscall (e.g. ptrace(PTRACE_POKEDATA)), I think we need the > > checks on those paths (and reject the syscall) rather than on > > mmap/mprotect(). This would be covered by something like CAP_SYS_PTRACE. > > I was told that RDMA uses FOLL_FORCE|FOLL_WRITE and is available to > unprivileged users. Ah, do they really need this? At a quick search, ib_umem_get() for example: unsigned int gup_flags = FOLL_WRITE; ... if (!umem->writable) gup_flags |= FOLL_FORCE; I guess with a new MDWE flag we can make the GUP code ignore FOLL_FORCE if VM_EXEC. -- Catalin