On 12/14/23 20:03, Mina Almasry wrote:
On Mon, Dec 11, 2023 at 12:37 PM Pavel Begunkov <asml.silence@xxxxxxxxx> wrote:
...
If you remove the branch, let it fall into ->release and rely
on refcounting there, then the callback could also fix up
release_cnt or ask pp to do it, like in the patch I linked above
Sadly I don't think this is possible due to the reasons I mention in
the commit message of that patch. Prematurely releasing ppiov and not
having them be candidates for recycling shows me a 4-5x degradation in
performance.
I don't think I follow. The concept is to only recycle a buffer (i.e.
make it available for allocation) when its refs drop to zero, which is
IMHO the only way it can work, and IIUC what this patchset is doing.
That's also I suggest to do, but through a slightly different path.
Let's say at some moment there are 2 refs (e.g. 1 for an skb and
1 for userspace/xarray).
Say it first puts the skb:
napi_pp_put_page()
-> page_pool_return_page()
-> mp_ops->release_page()
-> need_to_free = put_buf()
// not last ref, need_to_free==false,
// don't recycle, don't increase release_cnt
Then you put the last ref:
page_pool_iov_put_many()
-> page_pool_return_page()
-> mp_ops->release_page()
-> need_to_free = put_buf()
// last ref, need_to_free==true,
// recycle and release_cnt++
And that last put can even be recycled right into the
pp / ptr_ring, in which case it doesn't need to touch
release_cnt. Does it make sense? I don't see where
4-5x degradation would come from
Sorry for the late reply, I have been working on this locally.
What you're saying makes sense, and I'm no longer sure why I was
seeing a perf degradation without '[net-next v1 10/16] page_pool:
don't release iov on elevanted refcount'. However, even though what
you're saying is technically correct, AFAIU it's actually semantically
wrong. When a page is released by the page_pool, we should call
page_pool_clear_pp_info() and completely disconnect the page from the
pool. If we call release_page() on a page and then the page pool sees
it again in page_pool_return_page(), I think that is considered a bug.
You're adding a new feature the semantics of which is already
different from what is in there, you can extend it any way as long
as it makes sense and agreed on. IMHO, it does. But well, if
there is a better solution I'm all for it.
In fact I think what you're proposing is as a result of a bug because
we don't call a page_pool_clear_pp_info() equivalent on releasing
ppiov.
I don't get it, what bug? page_pool_clear_pp_info() is not called
for ppiov because it doesn't make sense to call it for ppiov,
there is no reason to clear ppiov->pp, nor there is any pp_magic.
However, I'm reasonably confident I figured out the right thing to do
here. The page_pool uses page->pp_frag_count for its refcounting.
pp_frag_count is a misnomer, it's being renamed to pp_ref_count in
Liang's series[1]). In this series I used a get_page/put_page
equivalent for refcounting. Once I transitioned to using
pp_[frag|ref]_count for refcounting inside the page_pool, the issue
went away, and I no longer need the patch 'page_pool: don't release
iov on elevanted refcount'.
Lovely, I'll take a look later! (also assuming it's in v5)
There is an additional upside, since pages and ppiovs are both being
refcounted using pp_[frag|ref]_count, we get some unified handling for
ppiov and we reduce the checks around ppiov. This should be fixed
properly in the next series.
I still need to do some work (~1 week) before I upload the next
version as there is a new requirement from MM that we transition to a
new type and not re-use page*, but I uploaded my changes github with
the refcounting issues resolved in case they're useful to you. Sorry
for the churn:
https://github.com/mina/linux/commits/tcpdevmem-v1.5/
[1] https://patchwork.kernel.org/project/netdevbpf/list/?series=809049&state=*
--
Pavel Begunkov