On 2024/8/15 23:25, Alexander Duyck wrote: > On Wed, Aug 14, 2024 at 8:05 PM Yunsheng Lin <linyunsheng@xxxxxxxxxx> wrote: >> >> On 2024/8/15 5:00, Alexander H Duyck wrote: > > ... > >>>> +static inline void page_frag_alloc_abort(struct page_frag_cache *nc, >>>> + unsigned int fragsz) >>>> +{ >>>> + nc->pagecnt_bias++; >>>> + nc->remaining += fragsz; >>>> +} >>>> + >>> >>> This doesn't add up. Why would you need abort if you have commit? Isn't >>> this more of a revert? I wouldn't think that would be valid as it is >>> possible you took some sort of action that might have resulted in this >>> memory already being shared. We shouldn't allow rewinding the offset >>> pointer without knowing that there are no other entities sharing the >>> page. >> >> This is used for __tun_build_skb() in drivers/net/tun.c as below, mainly >> used to avoid performance penalty for XDP drop case: > > Yeah, I reviewed that patch. As I said there, rewinding the offset > should be avoided unless you can verify you are the only owner of the > page as you have no guarantees that somebody else didn't take an > access to the page/data to send it off somewhere else. Once you expose > the page to any other entity it should be written off or committed in > your case and you should move on to the next block. Yes, the expectation is that somebody else didn't take an access to the page/data to send it off somewhere else between page_frag_alloc_va() and page_frag_alloc_abort(), did you see expectation was broken in that patch? If yes, we should fix that by using page_frag_free_va() related API instead of using page_frag_alloc_abort(). > > >> >>>> +static struct page *__page_frag_cache_reload(struct page_frag_cache *nc, >>>> + gfp_t gfp_mask) >>>> { >>>> + struct page *page; >>>> + >>>> if (likely(nc->encoded_va)) { >>>> - if (__page_frag_cache_reuse(nc->encoded_va, nc->pagecnt_bias)) >>>> + page = __page_frag_cache_reuse(nc->encoded_va, nc->pagecnt_bias); >>>> + if (page) >>>> goto out; >>>> } >>>> >>>> - if (unlikely(!__page_frag_cache_refill(nc, gfp_mask))) >>>> - return false; >>>> + page = __page_frag_cache_refill(nc, gfp_mask); >>>> + if (unlikely(!page)) >>>> + return NULL; >>>> >>>> out: >>>> /* reset page count bias and remaining to start of new frag */ >>>> nc->pagecnt_bias = PAGE_FRAG_CACHE_MAX_SIZE + 1; >>>> nc->remaining = page_frag_cache_page_size(nc->encoded_va); >>>> - return true; >>>> + return page; >>>> +} >>>> + >>> >>> None of the functions above need to be returning page. >> >> Are you still suggesting to always use virt_to_page() even when it is >> not really necessary? why not return the page here to avoid the >> virt_to_page()? > > Yes. The likelihood of you needing to pass this out as a page should > be low as most cases will just be you using the virtual address > anyway. You are essentially trading off branching for not having to > use virt_to_page. It is unnecessary optimization. As my understanding, I am not trading off branching for not having to use virt_to_page, the branching is already needed no matter we utilize it to avoid calling virt_to_page() or not, please be more specific about which branching is traded off for not having to use virt_to_page() here. > > >> >>>> +struct page *page_frag_alloc_pg(struct page_frag_cache *nc, >>>> + unsigned int *offset, unsigned int fragsz, >>>> + gfp_t gfp) >>>> +{ >>>> + unsigned int remaining = nc->remaining; >>>> + struct page *page; >>>> + >>>> + VM_BUG_ON(!fragsz); >>>> + if (likely(remaining >= fragsz)) { >>>> + unsigned long encoded_va = nc->encoded_va; >>>> + >>>> + *offset = page_frag_cache_page_size(encoded_va) - >>>> + remaining; >>>> + >>>> + return virt_to_page((void *)encoded_va); >>>> + } >>>> + >>>> + if (unlikely(fragsz > PAGE_SIZE)) >>>> + return NULL; >>>> + >>>> + page = __page_frag_cache_reload(nc, gfp); >>>> + if (unlikely(!page)) >>>> + return NULL; >>>> + >>>> + *offset = 0; >>>> + nc->remaining = remaining - fragsz; >>>> + nc->pagecnt_bias--; >>>> + >>>> + return page; >>>> } >>>> +EXPORT_SYMBOL(page_frag_alloc_pg); >>> >>> Again, this isn't returning a page. It is essentially returning a >>> bio_vec without calling it as such. You might as well pass the bio_vec >>> pointer as an argument and just have it populate it directly. >> >> I really don't think your bio_vec suggestion make much sense for now as >> the reason mentioned in below: >> >> "Through a quick look, there seems to be at least three structs which have >> similar values: struct bio_vec & struct skb_frag & struct page_frag. >> >> As your above agrument about using bio_vec, it seems it is ok to use any >> one of them as each one of them seems to have almost all the values we >> are using? >> >> Personally, my preference over them: 'struct page_frag' > 'struct skb_frag' >>> 'struct bio_vec', as the naming of 'struct page_frag' seems to best match >> the page_frag API, 'struct skb_frag' is the second preference because we >> mostly need to fill skb frag anyway, and 'struct bio_vec' is the last >> preference because it just happen to have almost all the values needed. > > That is why I said I would be okay with us passing page_frag in patch > 12 after looking closer at the code. The fact is it should make the > review of that patch set much easier if you essentially just pass the > page_frag back out of the call. Then it could be used in exactly the > same way it was before and should reduce the total number of lines of > code that need to be changed. So the your suggestion changed to something like below? int page_frag_alloc_pfrag(struct page_frag_cache *nc, struct page_frag *pfrag); The API naming of 'page_frag_alloc_pfrag' seems a little odd to me, any better one in your mind? > >> Is there any specific reason other than the above "almost all the values you >> are using are exposed by that structure already " that you prefer bio_vec?" >> >> 1. https://lore.kernel.org/all/ca6be29e-ab53-4673-9624-90d41616a154@xxxxxxxxxx/ > > My reason for preferring bio_vec is that of the 3 it is the most setup > to be used as a local variable versus something stored in a struct > such as page_frag or used for some specialty user case such as > skb_frag_t. In addition it already has a set of helpers for converting > it to a virtual address or copying data to and from it which would > make it easier to get rid of a bunch of duplicate code.