On 31.01.23 15:10, Jason Gunthorpe wrote:
On Tue, Jan 31, 2023 at 03:06:10PM +0100, David Hildenbrand wrote:
On 31.01.23 15:03, Jason Gunthorpe wrote:
On Tue, Jan 31, 2023 at 02:57:20PM +0100, David Hildenbrand wrote:
I'm excited by this series, thanks for making it.
The pin accounting has been a long standing problem and cgroups will
really help!
Indeed. I'm curious how GUP-fast, pinning the same page multiple times, and
pinning subpages of larger folios are handled :)
The same as today. The pinning is done based on the result from GUP,
and we charge every returned struct page.
So duplicates are counted multiple times, folios are ignored.
Removing duplicate charges would be costly, it would require storage
to keep track of how many times individual pages have been charged to
each cgroup (eg an xarray indexed by PFN of integers in each cgroup).
It doesn't seem worth the cost, IMHO.
We've made alot of investment now with iommufd to remove the most
annoying sources of duplicated pins so it is much less of a problem in
the qemu context at least.
Wasn't there the discussion regarding using vfio+io_uring+rdma+$whatever on
a VM and requiring multiple times the VM size as memlock limit?
Yes, but iommufd gives us some more options to mitigate this.
eg it makes some of logical sense to point RDMA at the iommufd page
table that is already pinned when trying to DMA from guest memory, in
this case it could ride on the existing pin.
Right, I suspect some issue is that the address space layout for the
RDMA device might be completely different. But I'm no expert on IOMMUs
at all :)
I do understand that at least multiple VFIO containers could benefit by
only pinning once (IIUC that mgiht have been an issue?).
Would it be the same now, just that we need multiple times the pin
limit?
Yes
Okay, thanks.
It's all still a big improvement, because I also asked for TDX
restrictedmem to be accounted somehow as unmovable.
--
Thanks,
David / dhildenb