Re: [PATCH net-next 4/4] xsk: add helper to get &xdp_desc's DMA and meta pointer in one go

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



From: Simon Horman <horms@xxxxxxxxxx>
Date: Sun, 9 Feb 2025 11:03:44 +0000

> On Thu, Feb 06, 2025 at 07:26:29PM +0100, Alexander Lobakin wrote:
>> Currently, when your driver supports XSk Tx metadata and you want to
>> send an XSk frame, you need to do the following:
>>
>> * call external xsk_buff_raw_get_dma();
>> * call inline xsk_buff_get_metadata(), which calls external
>>   xsk_buff_raw_get_data() and then do some inline checks.
>>
>> This effectively means that the following piece:
>>
>> addr = pool->unaligned ? xp_unaligned_add_offset_to_addr(addr) : addr;
>>
>> is done twice per frame, plus you have 2 external calls per frame, plus
>> this:
>>
>> 	meta = pool->addrs + addr - pool->tx_metadata_len;
>> 	if (unlikely(!xsk_buff_valid_tx_metadata(meta)))
>>
>> is always inlined, even if there's no meta or it's invalid.
>>
>> Add xsk_buff_raw_get_ctx() (xp_raw_get_ctx() to be precise) to do that
>> in one go. It returns a small structure with 2 fields: DMA address,
>> filled unconditionally, and metadata pointer, non-NULL only if it's
>> present and valid. The address correction is performed only once and
>> you also have only 1 external call per XSk frame, which does all the
>> calculations and checks outside of your hotpath. You only need to
>> check `if (ctx.meta)` for the metadata presence.
>> To not copy any existing code, derive address correction and getting
>> virtual and DMA address into small helpers. bloat-o-meter reports no
>> object code changes for the existing functionality.
>>
>> Signed-off-by: Alexander Lobakin <aleksander.lobakin@xxxxxxxxx>
> 
> Hi Alexander,
> 
> I think that this patch needs to be accompanied by at least one
> patch that uses xsk_buff_raw_get_ctx() in a driver.

This mini-series is the final part of my Chapter III, which was all
about prereqs in order to add libeth_xdp and then XDP for idpf.
This helper will be used in the next series (Chapter IV) I'll send once
this lands.

> 
> Also, as this seems to be an optimisation, some performance data would
> be nice too.

-1 Kb of object code which has an unrolled-by-8 loop which used this
function each iteration. I don't remember the perf numbers since it was
a year ago and since then I've been using this helper only, but it was
something around a couple procent (which is several hundred Kpps when it
comes to XSk).

> 
> Which brings me to my last point. I'd always understood that
> returning a struct was discouraged due to performance implications.

Rather stack usage, not perf implications. Compound returns are used
heavily throughout the kernel code when sizeof(result) <= 16 bytes.
Here it's also 16 bytes. Just the same as one __u128. Plus this function
doesn't recurse, so the stack won't blow up.

> Perhaps that information is out of date, doesn't apply because
> the returned struct is so small in this case, or just plain wrong.
> But I'd appreciate it if you could add some colour to this.

Moreover, the function is global, not inline, so passing a pointer here
instead of returning a struct may even behave worse in this case.

(and we'll save basically only 8 bytes on the stack, which I don't
 believe is worth it).

Thanks,
Olek




[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux