On Fri, Sep 08, 2023 at 06:48:05PM +0200, David Hildenbrand wrote: > vmsplice_to_pipe() -> iter_to_pipe() -> iov_iter_get_pages2() > > So it ends up calling get_user_pages_fast() > > ... and not using FOLL_PIN|FOLL_LONGTERM > > Why FOLL_LONGTERM? Because it's a longterm pin, where unprivileged users > can grab a reference on a page for all eternity, breaking CMA and memory > hotunplug (well, and harming compaction). > > Why FOLL_PIN? Well FOLL_LONGTERM only applies to FOLL_PIN. But for > anonymous memory, this will also take care of the last remaining hugetlb > COW test (trigger COW unsharing) as commented back in: > > https://lore.kernel.org/all/02063032-61e7-e1e5-cd51-a50337405159@xxxxxxxxxx/ Well, I'm not against it. It just isn't required for deadling with file system writeback vs GUP modification race this thread was started for. >> Can KVM page tables use file backed shared mappings? > > Yes, usually shmem and hugetlb. But with things like emulated > NVDIMMs/virtio-pmem for VMs, easily also ordinary files. > > But it's really not ordinary write access through GUP. It's write access > via a secondary page table (secondary MMU), that's synchronized to the > process page table -- just like if the CPU would be writing to the page > using the process page tables (primary MMU). Writing through the process page tables takes a write faul when first writing, which calls into ->page_mkwrite in the file system. Does the synchronization take care of that? If not we need to add or emulate it. > ptrace will find the pagecache page writable in the page table (PTE write > bit set), if it intends to write to the page (FOLL_WRITE). If it is not > writable, it will trigger a page fault that informs the file system. Yes, that case is (mostly) fine. > > With an FS that wants writenotify, we will not map a page writable (PTE > write bit not set) unless it is dirty (PTE dirty bit set) IIRC. > > So are we concerned about a race between the filesystem removing the PTE > write bit (to catch next write access before it gets dirtied again) and > ptrace marking the page dirty? Yes. This is the race that we've run into with various GUP users. > Yes. However, secondary MMU users (like KVM) would need some way to keep > making use of that; ideally, using a proper separate interface instead of > (ab)using plain GUP and confusing people :) I'mm all for that.