On 22.04.22 15:12, Catalin Marinas wrote: > On Fri, Apr 22, 2022 at 01:04:31PM +0200, David Hildenbrand wrote: >> On 22.04.22 12:28, Catalin Marinas wrote: >>> On Thu, Apr 21, 2022 at 06:37:49PM +0100, David Hildenbrand wrote: >>>> Note that in the (FOLL_WRITE|FOLL_FORCE) we only require VM_MAYWRITE on >>>> the vma and trigger a write fault. As the VMA is not VM_WRITE, we won't >>>> actually map the PTE writable, but set it dirty. GUP will retry, find a >>>> R/O pte that is dirty and where it knows that it broke COW and will >>>> allow the read access, although the PTE is R/O. >>>> >>>> That mechanism is required to e.g., set breakpoints in R/O MAP_PRIVATE >>>> kernel sections, but it's used elsewhere for page pinning as well. >>>> >>>> My gut feeling is that GUP(FOLL_WRITE|FOLL_FORCE) could be used right >>>> now to bypass that mechanism, I might be wrong. >>> >>> GUP can be used to bypass this. But if an attacker can trigger such GUP >>> paths via a syscall (e.g. ptrace(PTRACE_POKEDATA)), I think we need the >>> checks on those paths (and reject the syscall) rather than on >>> mmap/mprotect(). This would be covered by something like CAP_SYS_PTRACE. >> >> I was told that RDMA uses FOLL_FORCE|FOLL_WRITE and is available to >> unprivileged users. > > Ah, do they really need this? At a quick search, ib_umem_get() for > example: > > unsigned int gup_flags = FOLL_WRITE; > ... > if (!umem->writable) > gup_flags |= FOLL_FORCE; > > I guess with a new MDWE flag we can make the GUP code ignore FOLL_FORCE > if VM_EXEC. > It's, for example, required when you have a MAP_PRIVATE but PROT_READ mapping and want to take a reliable R/O (!) pin. Without FOLL_FORCE|FOLL_WRITE you'd be pinning a (shared zeropage, pagecache) page that will get replaced by an anonymous page in the COW handler, after mprotect(PROT_READ|PROT_WRITE) followed by a write access. That was an issue for RDMA in the past, that's why we have that handling in place IIRC. Yes, it's ugly. -- Thanks, David / dhildenb