On Wed, Sep 06, 2023 at 11:42:33AM +0200, David Hildenbrand wrote: >> and iov_iter_get_pages_alloc2. We have three file system direct I/O >> users of those left: ceph, fuse and nfs. Lei Huang has sent patches >> to convert fuse to iov_iter_extract_pages which I'd love to see merged, >> and we'd need equivalent work for ceph and nfs. >> >> The non-file system uses are in the vmsplice code, which only reads > > vmsplice really has to be fixed to specify FOLL_PIN|FOLL_LONGTERM for good; > I recall that David Howells had patches for that at one point. (at least to > use FOLL_PIN) Hmm, unless I'm misreading the code vmsplace is only using iov_iter_get_pages2 for reading from the user address space anyway. Or am I missing something? >> After that we might have to do an audit of the raw get_user_pages APIs, >> but there probably aren't many that modify file backed memory. > > ptrace should apply that ends up doing a FOLL_GET|FOLL_WRITE. Yes, if that ends up on file backed shared mappings we also need a pin. > Further, KVM ends up using FOLL_GET|FOLL_WRITE to populate the second-level > page tables for VMs, and uses MMU notifiers to synchronize the second-level > page tables with process page table changes. So once a PTE goes from > writable -> r/o in the process page table, the second level page tables for > the VM will get updated. Such MMU users are quite different from ordinary > GUP users. Can KVM page tables use file backed shared mappings? > Converting ptrace might not be desired/required as well (the reference is > dropped immediately after the read/write access). But the pin is needed to make sure the file system can account for dirtying the pages. Something we fundamentally can't do with get. > The end goal as discussed a couple of times would be the to limit FOLL_GET > in general only to a couple of users that can be audited and keep using it > for a good reason. Arbitrary drivers that perform DMA should stop using it > (and ideally be prevented from using it) and switch to FOLL_PIN. Agreed, that's where I'd like to get to. Preferably with the non-pin API not even beeing epxorted to modules.