On 3/13/25 09:44, Christoph Hellwig wrote:
On Thu, Mar 13, 2025 at 09:34:39AM +0100, Hannes Reinecke wrote:
nvmf_connect_command_prep() returns a kmalloced buffer.
Yes.
That is stored in a bvec in _nvme_submit_sync_cmd() via
blk_mq_rq_map_kern()->bio_map_kern().
And from that point on we are dealing with bvecs (iterators
and all), and losing the information that the page referenced
is a slab page.
Yes. But so does every other consomer of the block layer that passes
slab memory, of which there are quite a few. Various internal scsi
and nvme command come to mind, as does the XFS buffer cache.
The argument is that the network layer expected a kvec iterator
when slab pages are referred to, not a bvec iterator.
It doesn't. It just doesn't want you to use ->sendpage.
But we don't; we call 'sendpage_ok()' and disabling the MSG_SPLICE_PAGES
flag. Actual issue is that tls_sw() is calling iov_iter_alloc_pages(),
which is taking a page reference.
It probably should be calling iov_iter_extract_pages() (which does not
take a reference), but then one would need to review the entire network
stack as taking and releasing page references are littered throughout
the stack.
Cheers,
Hannes
--
Dr. Hannes Reinecke Kernel Storage Architect
hare@xxxxxxx +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich