Re: [PATCH v7 11/15] io_uring/zcrx: implement zerocopy receive pp memory provider

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 11/15/24 23:14, Jakub Kicinski wrote:
On Thu, 14 Nov 2024 12:56:14 -0800 Mina Almasry wrote:
But I've been bringing this up a lot in the past (and offering
alternatives that don't introduce this overloading) and I think this
conversation has run its course. I'm unsure about this approach and
this could use another pair of eyes. Jakub, sorry to bother you but
you probably are the one that reviewed the whole net_iov stuff most
closely. Any chance you would take a look and provide direction here?
Maybe my concern is overblown...

Sorry I haven't read this code very closely (still!?) really hoping
for next version to come during the merge window when time is more
abundant :S

 From scanning the quoted context I gather you're asking about using
the elevated ->pp_ref_count for user-owned pages? If yes - I also
find that part.. borderline incorrect. The page pool stats will show
these pages as allocated which normally means held by the driver or
the stack. Pages given to the user should effectively leave the pool.

It can't just drop all net_iov refs here, otherwise the buffer might
be returned back to page pool and reused while the user still reading
the data. We can't even be smart in the release callback as it might
never get there and be reallocated purely via alloc.cache. And either
way, tunneling all buffers into ->release_netmem would be horrible
for performance, and it'd probably even need a check in
page_pool_recycle_in_cache().

Fixing it for devmem TCP (which also holds a net_iov ref while it's
given to user, so we're not unique here) sounds even harder as
they're stashed in a socket xarray page pool knows nothing about,
so it might need some extra counting on top?

This set has a problem with page_pool_alloc_frag*() helpers, so
we'd either need to explicitly chip away some bits from ->pp_ref_count
or move user counting out of net_iov and double the cost of one
of the main sources of overhead, and then being very inventive
optimising it in the future, but that won't solve the "should
leave the pool" problem.

If it's just stats printing, it should be quite easy to fix
for the current set, ala some kind of "mask out bits responsible
for user refs". And I don't immediately have an idea of how to
address it for devmem TCP.

Also note, that if sth happens with io_uring or such, those
"user" refs are going to be dropped by the kernel off a page
pool callback, so it's not about leaking buffers.

So I'm guessing this is some optimization.
Same for patch 8.

This one will need some more explanation, otherwise it'd be a guess
game. What is incorrect? The overall approach or some implementation
aspect? It's also worth to note that it's a private queue, stopping
the napi attached to it shouldn't interfere with other queues and
users in the system, that's it assuming steering configured right.

But let me not get sucked into this before we wrap up the net-next PR..

--
Pavel Begunkov




[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux