On 5/8/24 16:58, Jason Gunthorpe wrote:
On Wed, May 08, 2024 at 04:44:32PM +0100, Pavel Begunkov wrote:
like a weird and indirect way to get there. Why can't io_uring just be
the entity that does the final free and not mess with the logic
allocator?
Then the user has to do a syscall (e.g. via io_uring) to return pages,
and there we'd need to care how to put the pages efficiently, i.e.
hitting the page pool's fast path, e.g. by hoping napi is scheduled and
scheduled for the CPU we're running on, or maybe transferring the pages
to the right CPU first.
Compare it with userspace putting pages into a ring, and the allocator
taking from there when needed without any extra synchronisation and
hassle just because it's a sole consumer.
Wow, that sounds a bit terrifying for security, but I guess I can see
your point.
Mind elaborating about security? "No synchronisation" is for grabbing
from the ring, it's napi exclusive, but it does refcounting to make sure
there are no previous net users left and the userspace doesn't try
anything funny like returning a page twice. And it's not even a page
but rather a separately refcounted buffer represented by an offset
from the userspace POV. It doesn't even have to be page sized, hw
benefits from smaller chunks.
You are replacing the whole allocator logic if you are effectively
putting the free list in userspace memory.
Jason
--
Pavel Begunkov