On 3/11/25 11:15, Jakub Kicinski wrote:
On Mon, 10 Mar 2025 14:35:24 +0000 Matthew Wilcox (Oracle) wrote:
Long-term, networking needs to stop taking a refcount on the pages that
it uses and rely on the caller to hold whatever references are necessary
to make the memory stable.
TBH I'm not clear on who is going to fix this.
IIRC we already told NVMe people that sending slab memory over sendpage
is not well supported. Plus the bug is in BPF integration, judging by
the stack traces (skmsg is a BPF thing). Joy.
Hmm. Did you? Seem to have missed it.
We make sure to not do it via the 'sendpage_ok()' call; but other than
that it's not much we can do.
And BPF is probably not the culprit; issue here is that we have a kvec,
package it into a bio (where it gets converted into a bvec),
and then call an iov iterator in tls_sw to get to the pages.
But at that stage we only see the bvec iterator, and the information
that it was an kvec to start with has been lost.
All wouldn't be so bad if we wouldn't call get_page/put_page (the caller
holds the reference, after all), but iov iterators and the skmsg code
insists upon.
Cheers,
Hannes
--
Dr. Hannes Reinecke Kernel Storage Architect
hare@xxxxxxx +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich