On 2023/6/16 2:26, Alexander Duyck wrote: > On Thu, Jun 15, 2023 at 9:51 AM Jakub Kicinski <kuba@xxxxxxxxxx> wrote: >> >> On Thu, 15 Jun 2023 15:17:39 +0800 Yunsheng Lin wrote: >>>> Does hns3_page_order() set a good example for the users? >>>> >>>> static inline unsigned int hns3_page_order(struct hns3_enet_ring *ring) >>>> { >>>> #if (PAGE_SIZE < 8192) >>>> if (ring->buf_size > (PAGE_SIZE / 2)) >>>> return 1; >>>> #endif >>>> return 0; >>>> } >>>> >>>> Why allocate order 1 pages for buffers which would fit in a single page? >>>> I feel like this soft of heuristic should be built into the API itself. >>> >>> hns3 only support fixed buf size per desc by 512 byte, 1024 bytes, 2048 bytes >>> 4096 bytes, see hns3_buf_size2type(), I think the order 1 pages is for buf size >>> with 4096 bytes and system page size with 4K, as hns3 driver still support the >>> per-desc ping-pong way of page splitting when page_pool_enabled is false. >>> >>> With page pool enabled, you are right that order 0 pages is enough, and I am not >>> sure about the exact reason we use the some order as the ping-pong way of page >>> splitting now. >>> As 2048 bytes buf size seems to be the default one, and I has not heard any one >>> changing it. Also, it caculates the pool_size using something as below, so the >>> memory usage is almost the same for order 0 and order 1: >>> >>> .pool_size = ring->desc_num * hns3_buf_size(ring) / >>> (PAGE_SIZE << hns3_page_order(ring)), >>> >>> I am not sure it worth changing it, maybe just change it to set good example for >>> the users:) anyway I need to discuss this with other colleague internally and do >>> some testing before doing the change. >> >> Right, I think this may be a leftover from the page flipping mode of >> operation. But AFAIU we should leave the recycling fully to the page >> pool now. If we make any improvements try to make them at the page pool >> level. I checked, the per-desc buf with 4096 bytes for hnse does not seem to be used mainly because of the larger memory usage you mentioned below. >> >> I like your patches as they isolate the drivers from having to make the >> fragmentation decisions based on the system page size (4k vs 64k but >> we're hearing more and more about ARM w/ 16k pages). For that use case >> this is great. Yes, That is my point. For hw case, the page splitting in page pool is mainly to enble multi-descs to use the same page as my understanding. >> >> What we don't want is drivers to start requesting larger page sizes >> because it looks good in iperf on a freshly booted, idle system :( > > Actually that would be a really good direction for this patch set to > look at going into. Rather than having us always allocate a "page" it > would make sense for most drivers to allocate a 4K fragment or the > like in the case that the base page size is larger than 4K. That might > be a good use case to justify doing away with the standard page pool > page and look at making them all fragmented. I am not sure if I understand the above, isn't the frag API able to support allocating a 4K fragment when base page size is larger than 4K before or after this patch? what more do we need to do? > > In the case of the standard page size being 4K a standard page would > just have to take on the CPU overhead of the atomic_set and > atomic_read for pp_ref_count (new name) which should be minimal as on > most sane systems those just end up being a memory write and read. If I understand you correctly, I think what you are trying to do may break some of Jesper' benchmarking:) [1] https://github.com/netoptimizer/prototype-kernel/blob/master/kernel/lib/bench_page_pool_simple.c > . >