Yunsheng Lin <linyunsheng@xxxxxxxxxx> writes: > On 2025/3/11 21:26, Toke Høiland-Jørgensen wrote: >> Yunsheng Lin <linyunsheng@xxxxxxxxxx> writes: >> >>> On 2025/3/10 23:24, Toke Høiland-Jørgensen wrote: >>> >>>>> >>>>> I guess that is one of the disadvantages that an advanced struct like >>>>> Xarray is used:( >>>> >>>> Sure, there will be some overhead from using xarray, but I think the >>>> simplicity makes up for it; especially since we can limit this to the >>> >>> As my understanding, it is more complicated, it is just that >>> complexity is hidden before xarray now. >> >> Yes, which encapsulates the complexity into a shared abstraction that is >> widely used in the kernel, so it does not add new complexity to the >> kernel as a whole. Whereas your series adds a whole bunch of new >> complexity to the kernel in the form of a new slab allocator. >> >>> Even if there is no space in 'struct page' to store the id, the >>> 'struct page' pointer itself can be used as id if the xarray can >>> use pointer as id. But it might mean the memory utilization might >>> not be as efficient as it should be, and performance hurts too if >>> there is more memory to be allocated and freed. >> >> I don't think it can. But sure, there can be other ways around this. >> >> FWIW, I don't think your idea of allocating page_pool_items to use as an >> indirection is totally crazy, but all the complexity around it (the >> custom slab allocator etc) is way too much. And if we can avoid the item >> indirection that is obviously better. >> >>> It seems it is just a matter of choices between using tailor-made >>> page_pool specific optimization and using some generic advanced >>> struct like xarray. >> >> Yup, basically. >> >>> I chose the tailor-made one because it ensure least overhead as >>> much as possibe from performance and memory utilization perspective, >>> for example, the 'single producer, multiple consumer' guarantee >>> offered by NAPI context can avoid some lock and atomic operation. >> >> Right, and my main point is that the complexity of this is not worth it :) > > Without the complexity, there is about 400ns performance overhead > for Xarray compared to about 10ns performance overhead for the > tailor-made one, which means there is about more than 200% performance > degradation for page_pool03_slow test case: Great, thanks for testing! I will include this in the non-RFC posting of this series :) >>>> cases where it's absolutely needed. >>> >>> The above can also be done for using page_pool_item too as the >>> lower 2 bits can be used to indicate the pointer in 'struct page' >>> is 'page_pool_item' or 'page_pool', I just don't think it is >>> necessary yet as it might add more checking in the fast path. >> >> Yup, did think about using the lower bits to distinguish if it does turn >> out that we can't avoid an indirection. See above; it's not actually the > > The 'memdesc' seems like an indirection to me when using that to shrink > 'struct page' to a smaller size. Yes, it does seem like we'll end up with an indirection of some kind eventually. But let's cross that bridge when we get to it... -Toke