Re: [PATCH net-next v8 11/17] io_uring/zcrx: implement zerocopy receive pp memory provider

Pavel Begunkov <asml.silence@xxxxxxxxx> · Thu, 12 Dec 2024 13:42:11 +0000

On 12/12/24 01:38, Jakub Kicinski wrote:
On Wed, 11 Dec 2024 14:42:43 +0000 Pavel Begunkov wrote:
I was thinking along the lines of transferring the ownership of
the frags. But let's work on that as a follow up. Atomic add on

That's fine to leave it out for now and deal later, but what's
important for me when going through preliminary shittification of
the project is to have a way to optimise it after and a clear
understanding that it can't be left w/o it, and that there are
no strong opinions that would block it.

The current cache situation is too unfortunate, understandably so
with it being aliased to struct page. pp_ref_count is in the
same line with ->pp and others. Here an iov usually gets modified
by napi, then refcounted from syscall, after deferred skb put will
put it down back at napi context, and in some time after it gets
washed out from the cache, the user will finally return it back
to page pool.

Let's not get distracted. It's very unusual to have arguments about
microoptimizations before the initial version of the code is merged :|

I can't avoid it since one of the goals is to save cpu cycles,
and it's not that micro either, but I hear you.

an exclusively owned cacheline is 2 cycles on AMD if I'm looking
correctly.

Sounds too good to be true considering x86 implies a full barrier
for atomics.

Right but two barriers back to back are hopefully similar impact as one.

I wonder where the data comes from?

Agner's instruction tables. What source do you use?

Mostly observational and from scattered hw knowledge. Seems like
the table says that the best case chained latency is ~8 cycles
for zen4, pretty good!

--
Pavel Begunkov