On Tue, Mar 11, 2025 at 04:46:45PM +0100, Hannes Reinecke wrote: > On 3/11/25 11:15, Jakub Kicinski wrote: > > On Mon, 10 Mar 2025 14:35:24 +0000 Matthew Wilcox (Oracle) wrote: > > > Long-term, networking needs to stop taking a refcount on the pages that > > > it uses and rely on the caller to hold whatever references are necessary > > > to make the memory stable. > > > > TBH I'm not clear on who is going to fix this. > > IIRC we already told NVMe people that sending slab memory over sendpage > > is not well supported. Plus the bug is in BPF integration, judging by > > the stack traces (skmsg is a BPF thing). Joy. > > Hmm. Did you? Seem to have missed it. > We make sure to not do it via the 'sendpage_ok()' call; but other than > that it's not much we can do. > > And BPF is probably not the culprit; issue here is that we have a kvec, > package it into a bio (where it gets converted into a bvec), > and then call an iov iterator in tls_sw to get to the pages. > But at that stage we only see the bvec iterator, and the information > that it was an kvec to start with has been lost. So I have two questions: Hannes: - Why does nvme need to turn the kvec into a bio rather than just send it directly? Jakub: - Why does the socket code think it needs to get a refcount on a bvec at all, since the block layer doesn't?