On Tue, Mar 04, 2025 at 05:32:32PM +0100, Hannes Reinecke wrote: > On 3/4/25 17:14, Matthew Wilcox wrote: > > I thought we'd done all the work needed to get rid of these pointless > > refcount bumps. Turns out that's only on the block side (eg commit > > e4cc64657bec). So what does networking need in order to understand > > that some iovecs do not need to mess with the refcount? > > The network stack needs to get hold of the page while transmission is > ongoing, as there is potentially rather deep queueing involved, > requiring several calls to sendmsg() and friends before the page is finally > transmitted. And maybe some post-processing (checksums, > digests, you name it), too, all of which require the page to be there. > > It's all so jumbled up ... personally, I would _love_ to do away with > __iov_iter_get_pages_alloc(). Allocating a page array? Seriously? > > And the problem with that is that it's always takes a page(!) reference, > completely oblivious to the fact whether you even _can_ take a page > reference (eg for tail pages); we've hit this problem several times now > (check for sendpage_ok() ...). Calling get_page() / put_page() on a tail page is fine -- that just redirects to the head page. But calling it on a slab never made any sense; at best it gets you the equivalent of TYPESAFE_BY_RCU -- that is, the object can be freed and reallocated, but the underlying slab will not be reallocated to some other purpose. > But that's not the real issue; real issue is that the page reference is > taken down in the very bowels of __iov_iter_get_pages_alloc(), but needs > to be undone by the _caller_. Who might (or might not) have an idea > that he needs to drop the reference here. > That's why there is no straightforward conversion; you need to audit > each and every caller and try to find out where the page reference (if any) > is dropped. > Bah. > > Can't we (at the very least) leave it to the caller of > __iov_iter_get_pages() to get a page reference (he has access to the page > array, after all ...)? That would make the interface slightly > better, and it'll be far more obvious to the caller what needs > to be done. Right, that's what happened in the block layer. We mark the bio with BIO_PAGE_PINNED if the pincount needs to be dropped. As a transitional period, we had BIO_PAGE_REFFED which indicated that the page refcount needed to be dropped. Perhaps there's something similar that network could be doing.