On 2024/8/19 23:52, Alexander Duyck wrote: >> >> Yes, the expectation is that somebody else didn't take an access to the >> page/data to send it off somewhere else between page_frag_alloc_va() >> and page_frag_alloc_abort(), did you see expectation was broken in that >> patch? If yes, we should fix that by using page_frag_free_va() related >> API instead of using page_frag_alloc_abort(). > > The problem is when you expose it to XDP there are a number of > different paths it can take. As such you shouldn't be expecting XDP to > not do something like that. Basically you have to check the reference Even if XDP operations like xdp_do_redirect() or tun_xdp_xmit() return failure, we still can not do that? It seems odd that happens. If not, can we use page_frag_alloc_abort() with fragsz being zero to avoid atomic operation? > count before you can rewind the page. > >>> >>> >>>> >>>>>> +static struct page *__page_frag_cache_reload(struct page_frag_cache *nc, >>>>>> + gfp_t gfp_mask) >>>>>> { >>>>>> + struct page *page; >>>>>> + >>>>>> if (likely(nc->encoded_va)) { >>>>>> - if (__page_frag_cache_reuse(nc->encoded_va, nc->pagecnt_bias)) >>>>>> + page = __page_frag_cache_reuse(nc->encoded_va, nc->pagecnt_bias); >>>>>> + if (page) >>>>>> goto out; >>>>>> } >>>>>> >>>>>> - if (unlikely(!__page_frag_cache_refill(nc, gfp_mask))) >>>>>> - return false; >>>>>> + page = __page_frag_cache_refill(nc, gfp_mask); >>>>>> + if (unlikely(!page)) >>>>>> + return NULL; >>>>>> >>>>>> out: >>>>>> /* reset page count bias and remaining to start of new frag */ >>>>>> nc->pagecnt_bias = PAGE_FRAG_CACHE_MAX_SIZE + 1; >>>>>> nc->remaining = page_frag_cache_page_size(nc->encoded_va); >>>>>> - return true; >>>>>> + return page; >>>>>> +} >>>>>> + >>>>> >>>>> None of the functions above need to be returning page. >>>> >>>> Are you still suggesting to always use virt_to_page() even when it is >>>> not really necessary? why not return the page here to avoid the >>>> virt_to_page()? >>> >>> Yes. The likelihood of you needing to pass this out as a page should >>> be low as most cases will just be you using the virtual address >>> anyway. You are essentially trading off branching for not having to >>> use virt_to_page. It is unnecessary optimization. >> >> As my understanding, I am not trading off branching for not having to >> use virt_to_page, the branching is already needed no matter we utilize >> it to avoid calling virt_to_page() or not, please be more specific about >> which branching is traded off for not having to use virt_to_page() here. > > The virt_to_page overhead isn't that high. It would be better to just > use a consistent path rather than try to optimize for an unlikely > branch in your datapath. I am not sure if I understand what do you mean by 'consistent path' here. If I understand your comment correctly, the path is already not consistent to avoid having to fetch size multiple times multiple ways as mentioned in [1]. As below, doesn't it seems nature to avoid virt_to_page() calling while avoiding page_frag_cache_page_size() calling, even if it is an unlikely case as you mentioned: struct page *page_frag_alloc_pg(struct page_frag_cache *nc, unsigned int *offset, unsigned int fragsz, gfp_t gfp) { unsigned int remaining = nc->remaining; struct page *page; VM_BUG_ON(!fragsz); if (likely(remaining >= fragsz)) { unsigned long encoded_va = nc->encoded_va; *offset = page_frag_cache_page_size(encoded_va) - remaining; return virt_to_page((void *)encoded_va); } if (unlikely(fragsz > PAGE_SIZE)) return NULL; page = __page_frag_cache_reload(nc, gfp); if (unlikely(!page)) return NULL; *offset = 0; nc->remaining -= fragsz; nc->pagecnt_bias--; return page; } 1. https://lore.kernel.org/all/CAKgT0UeQ9gwYo7qttak0UgXC9+kunO2gedm_yjtPiMk4VJp9yQ@xxxxxxxxxxxxxx/ > >>> >>> >>>> >>>>>> +struct page *page_frag_alloc_pg(struct page_frag_cache *nc, >>>>>> + unsigned int *offset, unsigned int fragsz, >>>>>> + gfp_t gfp) >>>>>> +{ >>>>>> + unsigned int remaining = nc->remaining; >>>>>> + struct page *page; >>>>>> + >>>>>> + VM_BUG_ON(!fragsz); >>>>>> + if (likely(remaining >= fragsz)) { >>>>>> + unsigned long encoded_va = nc->encoded_va; >>>>>> + >>>>>> + *offset = page_frag_cache_page_size(encoded_va) - >>>>>> + remaining; >>>>>> + >>>>>> + return virt_to_page((void *)encoded_va); >>>>>> + } >>>>>> + >>>>>> + if (unlikely(fragsz > PAGE_SIZE)) >>>>>> + return NULL; >>>>>> + >>>>>> + page = __page_frag_cache_reload(nc, gfp); >>>>>> + if (unlikely(!page)) >>>>>> + return NULL; >>>>>> + >>>>>> + *offset = 0; >>>>>> + nc->remaining = remaining - fragsz; >>>>>> + nc->pagecnt_bias--; >>>>>> + >>>>>> + return page; >>>>>> } >>>>>> +EXPORT_SYMBOL(page_frag_alloc_pg); >>>>> >>>>> Again, this isn't returning a page. It is essentially returning a >>>>> bio_vec without calling it as such. You might as well pass the bio_vec >>>>> pointer as an argument and just have it populate it directly. >>>> >>>> I really don't think your bio_vec suggestion make much sense for now as >>>> the reason mentioned in below: >>>> >>>> "Through a quick look, there seems to be at least three structs which have >>>> similar values: struct bio_vec & struct skb_frag & struct page_frag. >>>> >>>> As your above agrument about using bio_vec, it seems it is ok to use any >>>> one of them as each one of them seems to have almost all the values we >>>> are using? >>>> >>>> Personally, my preference over them: 'struct page_frag' > 'struct skb_frag' >>>>> 'struct bio_vec', as the naming of 'struct page_frag' seems to best match >>>> the page_frag API, 'struct skb_frag' is the second preference because we >>>> mostly need to fill skb frag anyway, and 'struct bio_vec' is the last >>>> preference because it just happen to have almost all the values needed. >>> >>> That is why I said I would be okay with us passing page_frag in patch >>> 12 after looking closer at the code. The fact is it should make the >>> review of that patch set much easier if you essentially just pass the >>> page_frag back out of the call. Then it could be used in exactly the >>> same way it was before and should reduce the total number of lines of >>> code that need to be changed. >> >> So the your suggestion changed to something like below? >> >> int page_frag_alloc_pfrag(struct page_frag_cache *nc, struct page_frag *pfrag); >> >> The API naming of 'page_frag_alloc_pfrag' seems a little odd to me, any better >> one in your mind? > > Well at this point we are populating/getting/pulling a page frag from > the page frag cache. Maybe look for a word other than alloc such as > populate. Essentially what you are doing is populating the pfrag from > the frag cache, although I thought there was a size value you passed > for that isn't there? 'struct page_frag' does have a size field, but I am not sure if I understand what do you mean by 'although I thought there was a size value you passed for that isn't there?‘ yet.