Re: [PATCH net-next v13 11/14] mm: page_frag: introduce prepare/probe/commit API

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Aug 14, 2024 at 8:05 PM Yunsheng Lin <linyunsheng@xxxxxxxxxx> wrote:
>
> On 2024/8/15 5:00, Alexander H Duyck wrote:

...

> >> +static inline void page_frag_alloc_abort(struct page_frag_cache *nc,
> >> +                                     unsigned int fragsz)
> >> +{
> >> +    nc->pagecnt_bias++;
> >> +    nc->remaining += fragsz;
> >> +}
> >> +
> >
> > This doesn't add up. Why would you need abort if you have commit? Isn't
> > this more of a revert? I wouldn't think that would be valid as it is
> > possible you took some sort of action that might have resulted in this
> > memory already being shared. We shouldn't allow rewinding the offset
> > pointer without knowing that there are no other entities sharing the
> > page.
>
> This is used for __tun_build_skb() in drivers/net/tun.c as below, mainly
> used to avoid performance penalty for XDP drop case:

Yeah, I reviewed that patch. As I said there, rewinding the offset
should be avoided unless you can verify you are the only owner of the
page as you have no guarantees that somebody else didn't take an
access to the page/data to send it off somewhere else. Once you expose
the page to any other entity it should be written off or committed in
your case and you should move on to the next block.


>
> >> +static struct page *__page_frag_cache_reload(struct page_frag_cache *nc,
> >> +                                         gfp_t gfp_mask)
> >>  {
> >> +    struct page *page;
> >> +
> >>      if (likely(nc->encoded_va)) {
> >> -            if (__page_frag_cache_reuse(nc->encoded_va, nc->pagecnt_bias))
> >> +            page = __page_frag_cache_reuse(nc->encoded_va, nc->pagecnt_bias);
> >> +            if (page)
> >>                      goto out;
> >>      }
> >>
> >> -    if (unlikely(!__page_frag_cache_refill(nc, gfp_mask)))
> >> -            return false;
> >> +    page = __page_frag_cache_refill(nc, gfp_mask);
> >> +    if (unlikely(!page))
> >> +            return NULL;
> >>
> >>  out:
> >>      /* reset page count bias and remaining to start of new frag */
> >>      nc->pagecnt_bias = PAGE_FRAG_CACHE_MAX_SIZE + 1;
> >>      nc->remaining = page_frag_cache_page_size(nc->encoded_va);
> >> -    return true;
> >> +    return page;
> >> +}
> >> +
> >
> > None of the functions above need to be returning page.
>
> Are you still suggesting to always use virt_to_page() even when it is
> not really necessary? why not return the page here to avoid the
> virt_to_page()?

Yes. The likelihood of you needing to pass this out as a page should
be low as most cases will just be you using the virtual address
anyway. You are essentially trading off branching for not having to
use virt_to_page. It is unnecessary optimization.


>
> >> +struct page *page_frag_alloc_pg(struct page_frag_cache *nc,
> >> +                            unsigned int *offset, unsigned int fragsz,
> >> +                            gfp_t gfp)
> >> +{
> >> +    unsigned int remaining = nc->remaining;
> >> +    struct page *page;
> >> +
> >> +    VM_BUG_ON(!fragsz);
> >> +    if (likely(remaining >= fragsz)) {
> >> +            unsigned long encoded_va = nc->encoded_va;
> >> +
> >> +            *offset = page_frag_cache_page_size(encoded_va) -
> >> +                            remaining;
> >> +
> >> +            return virt_to_page((void *)encoded_va);
> >> +    }
> >> +
> >> +    if (unlikely(fragsz > PAGE_SIZE))
> >> +            return NULL;
> >> +
> >> +    page = __page_frag_cache_reload(nc, gfp);
> >> +    if (unlikely(!page))
> >> +            return NULL;
> >> +
> >> +    *offset = 0;
> >> +    nc->remaining = remaining - fragsz;
> >> +    nc->pagecnt_bias--;
> >> +
> >> +    return page;
> >>  }
> >> +EXPORT_SYMBOL(page_frag_alloc_pg);
> >
> > Again, this isn't returning a page. It is essentially returning a
> > bio_vec without calling it as such. You might as well pass the bio_vec
> > pointer as an argument and just have it populate it directly.
>
> I really don't think your bio_vec suggestion make much sense  for now as
> the reason mentioned in below:
>
> "Through a quick look, there seems to be at least three structs which have
> similar values: struct bio_vec & struct skb_frag & struct page_frag.
>
> As your above agrument about using bio_vec, it seems it is ok to use any
> one of them as each one of them seems to have almost all the values we
> are using?
>
> Personally, my preference over them: 'struct page_frag' > 'struct skb_frag'
> > 'struct bio_vec', as the naming of 'struct page_frag' seems to best match
> the page_frag API, 'struct skb_frag' is the second preference because we
> mostly need to fill skb frag anyway, and 'struct bio_vec' is the last
> preference because it just happen to have almost all the values needed.

That is why I said I would be okay with us passing page_frag in patch
12 after looking closer at the code. The fact is it should make the
review of that patch set much easier if you essentially just pass the
page_frag back out of the call. Then it could be used in exactly the
same way it was before and should reduce the total number of lines of
code that need to be changed.

> Is there any specific reason other than the above "almost all the values you
> are using are exposed by that structure already " that you prefer bio_vec?"
>
> 1. https://lore.kernel.org/all/ca6be29e-ab53-4673-9624-90d41616a154@xxxxxxxxxx/

My reason for preferring bio_vec is that of the 3 it is the most setup
to be used as a local variable versus something stored in a struct
such as page_frag or used for some specialty user case such as
skb_frag_t. In addition it already has a set of helpers for converting
it to a virtual address or copying data to and from it which would
make it easier to get rid of a bunch of duplicate code.





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux