Re: getting rid of the last memory modifitions through gup(FOLL_GET)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 05.09.23 16:16, Christoph Hellwig wrote:
Hi all,

Hi,


we've made some nice progress on converting code that modifies user
memory to the pin_user_pages interface, especially though the work
from David Howells on iov_iter_extract_pages.  This thread tries to
coordinate on how to finish off this work.

The obvious next step is the remaining users of iov_iter_get_pages2
and iov_iter_get_pages_alloc2.  We have three file system direct I/O
users of those left: ceph, fuse and nfs.  Lei Huang has sent patches
to convert fuse to iov_iter_extract_pages which I'd love to see merged,
and we'd need equivalent work for ceph and nfs.

The non-file system uses are in the vmsplice code, which only reads

vmsplice really has to be fixed to specify FOLL_PIN|FOLL_LONGTERM for good; I recall that David Howells had patches for that at one point. (at least to use FOLL_PIN)

from the pages (but would still benefit from an iov_iter_extract_pages
conversion), and in net.  Out of the users in net, all but the 9p code
appear to be for reads from memory, so they don't pin even if a
conversion would be nice to retire iov_iter_get_pages* APIs.

After that we might have to do an audit of the raw get_user_pages APIs,
but there probably aren't many that modify file backed memory.

ptrace should apply that ends up doing a FOLL_GET|FOLL_WRITE.

Further, KVM ends up using FOLL_GET|FOLL_WRITE to populate the second-level page tables for VMs, and uses MMU notifiers to synchronize the second-level page tables with process page table changes. So once a PTE goes from writable -> r/o in the process page table, the second level page tables for the VM will get updated. Such MMU users are quite different from ordinary GUP users.

Converting the latter to FOLL_PIN is not desired (as it would implicitly trigger COW-unsharing on KSM pages -- but GUP+MMU notifiers is different to ordinary GUP+read/write where there is no such synchronization).

Converting ptrace might not be desired/required as well (the reference is dropped immediately after the read/write access).

The end goal as discussed a couple of times would be the to limit FOLL_GET in general only to a couple of users that can be audited and keep using it for a good reason. Arbitrary drivers that perform DMA should stop using it (and ideally be prevented from using it) and switch to FOLL_PIN.

--
Cheers,

David / dhildenb




[Index of Archives]     [CEPH Users]     [Ceph Large]     [Ceph Dev]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux