From: Maciej Fijalkowski <maciej.fijalkowski@xxxxxxxxx> Date: Tue, 22 Oct 2024 17:42:13 +0200 > On Tue, Oct 15, 2024 at 04:53:48PM +0200, Alexander Lobakin wrote: >> Currently, when you send an XSk frame without metadata, you need to do > > you meant *with* metadata? Eeeeh... Maybe, I forgot already what I wanted to say =\ > >> the following: >> >> * call external xsk_buff_raw_get_dma(); >> * call inline xsk_buff_get_metadata(), which calls external >> xsk_buff_raw_get_data() and then do some inline checks. >> >> This effectively means that the following piece: >> >> addr = pool->unaligned ? xp_unaligned_add_offset_to_addr(addr) : addr; >> >> is done twice per frame, plus you have 2 external calls per frame, plus >> this: >> >> meta = pool->addrs + addr - pool->tx_metadata_len; >> if (unlikely(!xsk_buff_valid_tx_metadata(meta))) >> >> is always inlined, even if there's no meta or it's invalid. > > when there is no meta you bail out early in xsk_buff_get_metadata() as > tx_metadata_len was not set, no? Yes, but this code is still inlined. See below (at the end of the reply). > >> >> Add xsk_buff_raw_get_ctx() (xp_raw_get_ctx() to be precise) to do that >> in one go. It returns a small structure with 2 fields: DMA address, >> filled unconditionally, and metadata pointer, valid only if it's >> present. The address correction is performed only once and you also >> have only 1 external call per XSk frame, which does all the calculations >> and checks outside of your hotpath. You only need to check >> `if (ctx.meta)` for the metadata presence. > > IMHO adding this might confuse future users which approach should be > preferred. It's a regular practice in the kernel that we have several functions to do +/- the same. It's up to the developer which one to pick, he reads the code and decides himself. > > Thinking out loud...couldn't we export address correction logic and pass > the corrected addr to xsk_buff_get_metadata and then add it to > pool->addrs. But that would require modifying existing callsites + > addressing xp_raw_get_dma() as well :< Yes, modifying current API requires touching the users. + keeping xsk_buff_get_metadata negates most the main purpose of this patch, see below. > > Standard question - any perf improvement when micro benchmarking? :P TBH I didn't test before/after with the meta enabled, but it was enough for me that using this function instead of the get_dma + get_meta pair reduces the object code size by 1 Kb when unrolling by 8. Thanks, Olek