On Mon, Dec 11, 2023 at 12:37 PM Pavel Begunkov <asml.silence@xxxxxxxxx> wrote: ... > >> If you remove the branch, let it fall into ->release and rely > >> on refcounting there, then the callback could also fix up > >> release_cnt or ask pp to do it, like in the patch I linked above > >> > > > > Sadly I don't think this is possible due to the reasons I mention in > > the commit message of that patch. Prematurely releasing ppiov and not > > having them be candidates for recycling shows me a 4-5x degradation in > > performance. > > I don't think I follow. The concept is to only recycle a buffer (i.e. > make it available for allocation) when its refs drop to zero, which is > IMHO the only way it can work, and IIUC what this patchset is doing. > > That's also I suggest to do, but through a slightly different path. > Let's say at some moment there are 2 refs (e.g. 1 for an skb and > 1 for userspace/xarray). > > Say it first puts the skb: > > napi_pp_put_page() > -> page_pool_return_page() > -> mp_ops->release_page() > -> need_to_free = put_buf() > // not last ref, need_to_free==false, > // don't recycle, don't increase release_cnt > > Then you put the last ref: > > page_pool_iov_put_many() > -> page_pool_return_page() > -> mp_ops->release_page() > -> need_to_free = put_buf() > // last ref, need_to_free==true, > // recycle and release_cnt++ > > And that last put can even be recycled right into the > pp / ptr_ring, in which case it doesn't need to touch > release_cnt. Does it make sense? I don't see where > 4-5x degradation would come from > > Sorry for the late reply, I have been working on this locally. What you're saying makes sense, and I'm no longer sure why I was seeing a perf degradation without '[net-next v1 10/16] page_pool: don't release iov on elevanted refcount'. However, even though what you're saying is technically correct, AFAIU it's actually semantically wrong. When a page is released by the page_pool, we should call page_pool_clear_pp_info() and completely disconnect the page from the pool. If we call release_page() on a page and then the page pool sees it again in page_pool_return_page(), I think that is considered a bug. In fact I think what you're proposing is as a result of a bug because we don't call a page_pool_clear_pp_info() equivalent on releasing ppiov. However, I'm reasonably confident I figured out the right thing to do here. The page_pool uses page->pp_frag_count for its refcounting. pp_frag_count is a misnomer, it's being renamed to pp_ref_count in Liang's series[1]). In this series I used a get_page/put_page equivalent for refcounting. Once I transitioned to using pp_[frag|ref]_count for refcounting inside the page_pool, the issue went away, and I no longer need the patch 'page_pool: don't release iov on elevanted refcount'. There is an additional upside, since pages and ppiovs are both being refcounted using pp_[frag|ref]_count, we get some unified handling for ppiov and we reduce the checks around ppiov. This should be fixed properly in the next series. I still need to do some work (~1 week) before I upload the next version as there is a new requirement from MM that we transition to a new type and not re-use page*, but I uploaded my changes github with the refcounting issues resolved in case they're useful to you. Sorry for the churn: https://github.com/mina/linux/commits/tcpdevmem-v1.5/ [1] https://patchwork.kernel.org/project/netdevbpf/list/?series=809049&state=* -- Thanks, Mina