Re: getting rid of the last memory modifitions through gup(FOLL_GET)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Sep 06, 2023 at 11:42:33AM +0200, David Hildenbrand wrote:
>> and iov_iter_get_pages_alloc2.  We have three file system direct I/O
>> users of those left: ceph, fuse and nfs.  Lei Huang has sent patches
>> to convert fuse to iov_iter_extract_pages which I'd love to see merged,
>> and we'd need equivalent work for ceph and nfs.
>>
>> The non-file system uses are in the vmsplice code, which only reads
>
> vmsplice really has to be fixed to specify FOLL_PIN|FOLL_LONGTERM for good; 
> I recall that David Howells had patches for that at one point. (at least to 
> use FOLL_PIN)

Hmm, unless I'm misreading the code vmsplace is only using
iov_iter_get_pages2 for reading from the user address space anyway.
Or am I missing something?

>> After that we might have to do an audit of the raw get_user_pages APIs,
>> but there probably aren't many that modify file backed memory.
>
> ptrace should apply that ends up doing a FOLL_GET|FOLL_WRITE.

Yes, if that ends up on file backed shared mappings we also need a pin.

> Further, KVM ends up using FOLL_GET|FOLL_WRITE to populate the second-level 
> page tables for VMs, and uses MMU notifiers to synchronize the second-level 
> page tables with process page table changes. So once a PTE goes from 
> writable -> r/o in the process page table, the second level page tables for 
> the VM will get updated. Such MMU users are quite different from ordinary 
> GUP users.

Can KVM page tables use file backed shared mappings?

> Converting ptrace might not be desired/required as well (the reference is 
> dropped immediately after the read/write access).

But the pin is needed to make sure the file system can account for
dirtying the pages.  Something we fundamentally can't do with get.

> The end goal as discussed a couple of times would be the to limit FOLL_GET 
> in general only to a couple of users that can be audited and keep using it 
> for a good reason. Arbitrary drivers that perform DMA should stop using it 
> (and ideally be prevented from using it) and switch to FOLL_PIN.

Agreed, that's where I'd like to get to.  Preferably with the non-pin
API not even beeing epxorted to modules.



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Ceph Dev]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux