On Mon, Feb 10, 2025 at 05:00:36PM +0100, Alexander Lobakin wrote: > From: Simon Horman <horms@xxxxxxxxxx> > Date: Sun, 9 Feb 2025 11:03:44 +0000 > > > On Thu, Feb 06, 2025 at 07:26:29PM +0100, Alexander Lobakin wrote: > >> Currently, when your driver supports XSk Tx metadata and you want to > >> send an XSk frame, you need to do the following: > >> > >> * call external xsk_buff_raw_get_dma(); > >> * call inline xsk_buff_get_metadata(), which calls external > >> xsk_buff_raw_get_data() and then do some inline checks. > >> > >> This effectively means that the following piece: > >> > >> addr = pool->unaligned ? xp_unaligned_add_offset_to_addr(addr) : addr; > >> > >> is done twice per frame, plus you have 2 external calls per frame, plus > >> this: > >> > >> meta = pool->addrs + addr - pool->tx_metadata_len; > >> if (unlikely(!xsk_buff_valid_tx_metadata(meta))) > >> > >> is always inlined, even if there's no meta or it's invalid. > >> > >> Add xsk_buff_raw_get_ctx() (xp_raw_get_ctx() to be precise) to do that > >> in one go. It returns a small structure with 2 fields: DMA address, > >> filled unconditionally, and metadata pointer, non-NULL only if it's > >> present and valid. The address correction is performed only once and > >> you also have only 1 external call per XSk frame, which does all the > >> calculations and checks outside of your hotpath. You only need to > >> check `if (ctx.meta)` for the metadata presence. > >> To not copy any existing code, derive address correction and getting > >> virtual and DMA address into small helpers. bloat-o-meter reports no > >> object code changes for the existing functionality. > >> > >> Signed-off-by: Alexander Lobakin <aleksander.lobakin@xxxxxxxxx> > > > > Hi Alexander, > > > > I think that this patch needs to be accompanied by at least one > > patch that uses xsk_buff_raw_get_ctx() in a driver. > > This mini-series is the final part of my Chapter III, which was all > about prereqs in order to add libeth_xdp and then XDP for idpf. > This helper will be used in the next series (Chapter IV) I'll send once > this lands. Understood. If it's going to be used in chapter IV then, given that we've made it to chapter II, that is fine by me. > > Also, as this seems to be an optimisation, some performance data would > > be nice too. > > -1 Kb of object code which has an unrolled-by-8 loop which used this > function each iteration. I don't remember the perf numbers since it was > a year ago and since then I've been using this helper only, but it was > something around a couple procent (which is several hundred Kpps when it > comes to XSk). Thanks. It might be worth including some of that information in the commit message, but I don't feel strongly about it. > > > > > Which brings me to my last point. I'd always understood that > > returning a struct was discouraged due to performance implications. > > Rather stack usage, not perf implications. Compound returns are used > heavily throughout the kernel code when sizeof(result) <= 16 bytes. > Here it's also 16 bytes. Just the same as one __u128. Plus this function > doesn't recurse, so the stack won't blow up. Also understood. It seems my assumptions were somewhat wrong. So I have no objections to this approach. > > Perhaps that information is out of date, doesn't apply because > > the returned struct is so small in this case, or just plain wrong. > > But I'd appreciate it if you could add some colour to this. > > Moreover, the function is global, not inline, so passing a pointer here > instead of returning a struct may even behave worse in this case. > > (and we'll save basically only 8 bytes on the stack, which I don't > believe is worth it). > > Thanks, > Olek >