Re: [PATCH] mm/gup: restore the ability to pin more than 2GB at a time

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 30.10.24 07:50, John Hubbard wrote:
On 10/29/24 11:18 PM, Alistair Popple wrote:
John Hubbard <jhubbard@xxxxxxxxxx> writes:
On 10/29/24 9:42 PM, Christoph Hellwig wrote:
On Tue, Oct 29, 2024 at 09:39:15PM -0700, John Hubbard wrote:
...
Because pinning down these amounts of memoryt is completely insane.
I don't mind the switch to kvmalloc, but we need to put in an upper
bound of what can be pinned.

I'm wondering though, how it is that we decide how much of the user's
system we prevent them from using? :)  People with hardware accelerators
do not always have page fault capability, and yet these troublesome
users insist on stacking their system full of DRAM and then pointing
the accelerator to it.

How would we choose a value? Memory sizes keep going up...

The obvious answer is you let users decide. I did have a patch series to
do that via a cgroup[1]. However I dropped that series mostly because I
couldn't find any users of such a limit to provide feedback on how they
would use it or how they wanted it to work.


Trawling through the discussion there, I see that Jason Gunthorpe mentioned:

"Things like VFIO & KVM use cases effectively pin 90% of all system memory"

The unusual thing is not the amount of system memory we are pinning but *how many* pages we try pinning in the single call.

If you stare at vfio_pin_pages_remote, we seem to be batching it.

long req_pages = min_t(long, npage, batch->capacity);

Which is

#define VFIO_BATCH_MAX_CAPACITY (PAGE_SIZE / sizeof(struct page *))


So you can fix this in your driver ;)


We should maybe try a similar limit internally: if you call pin_user_pages_remote() with a large number, we'll cap it at some magic value (similar to above). The caller will simply realize that not all pages were pinned and will retry.

See get_user_pages_remote(): "Returns either number of pages pinned (which may be less than the number requested), or an error. Details about the return value:"


Alternatively, I recall there was a way to avoid the temporary allocation ... let me hack up a prototype real quick.
--
Cheers,

David / dhildenb





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux