> > > Was it noticable in some sort of performance test? > > > > This patch is part of the patchset found at > > https://lore.kernel.org/all/20230412162114.19389-3-kal.conley@xxxxxxxxxxx/ > > which is being actively discussed and needs to be resubmitted anyway > > because of a conflict. While the discussion continues, I am submitting > > this patch by itself because I think it's an improvement on its own > > (regardless of what happens with the rest of the linked patchset). On > > one system, I measured a performance regression of 2-3% with xdpsock > > and the linked changes without the current patch. With the current > > patch, the performance regression was no longer observed. > > Would be nice to have in commit message so reader has an idea the > perf numbers are in fact better. When I measured this patch by itself (on bpf-next), I didn't measure any statistically significant performance gains. However, it did allow me to avoid a regression when combined with the other linked patch (as mentioned). I don't know if it makes sense to mention that other change which is not even applied to any tree. I was mainly submitting this patch from the perspective of the code being better not contingent on any provable performance gains. > > > > > > > diff --git a/include/net/xsk_buff_pool.h b/include/net/xsk_buff_pool.h > > > > index d318c769b445..a8d7b8a3688a 100644 > > > > --- a/include/net/xsk_buff_pool.h > > > > +++ b/include/net/xsk_buff_pool.h > > > > @@ -180,7 +180,7 @@ static inline bool xp_desc_crosses_non_contig_pg(struct xsk_buff_pool *pool, > > > > if (likely(!cross_pg)) > > > > return false; > > > > > > > > - return pool->dma_pages_cnt && > > > > + return pool->dma_pages && > > > > !(pool->dma_pages[addr >> PAGE_SHIFT] & XSK_NEXT_PG_CONTIG_MASK); > > > > } > > > > I would consider the above code part of the "fast path". It may be > > executed approximately once per frame in unaligned mode. > > In the unlikely case though is my reading. So really shouldn't > be called for every packet or we have other perf issues by that > likely() there. > > I assume the above is where the perf is being gained because below > two things are in setup/tear down. But then we are benchmarking > an unlikely() path? I was testing with large chunk sizes in unaligned mode (4000-4096 bytes) with ZC. For chunk sizes nearly as large as PAGE_SIZE the unlikely path is actually the main path. > > > > > This seems to be used in the setup/tear-down paths so your optimizing > > > a control side. Is there a fast path with this code? I walked the > > > ice driver. If its just setup code we should do whatever is more > > > readable. > > > > It is not only used in setup/tear-down paths (see above). > > Additionally, I believe the code is also _more_ readable with this > > patch applied. In particular, this patch reduces cognitive complexity > > since people (and compilers) reading the code don't need to > > additionally think about pool->dma_pages_cnt. > > > > > Both the _alloc_ cases read neighboring free_heads_cnt so your saving a load I guess? > > > This is so deep into micro-optimizing I'm curious if you could measure it? > > > > It is saving a load which also reduces code size. This will affect > > other decisions such as what to inline. Also in the linked patchset, > > dma_pages and dma_pages_cnt do not share a cache line (on x86_64). > > But again buried in an unlikely path. Sure but removing the conditional > altogether would be even better. Yeah, I think that is another improvement to consider. > So my understanding is ZC is preferred and default mode and copy modes > are primarily fall back modes. So we are punishing the good case here > for a fallback to copy mode. I think overall refactoring the code to > avoid burdoning the fast case with a fallback slow case would be ideal > solution. I agree that ZC is preferred and this patch is aimed at improving the ZC path. The performance gain I observed was for ZC. > However, I agree just on readability the patch is fine and good. No > objection on my side. But I think if we are making performance > arguments for 2-3% here the better thing to do is remove the check > and unlikely() and we would see better benchmarks when using the > ZC mode which as I understand it is what performance aware folks should > be doing. I totally agree that other better improvements exist but I don't think they make this patch any less desirable. This change is only meant as a small incremental improvement.