On 17/07/2024 23:26, Ilya Dryomov wrote:
On Tue, Jul 16, 2024 at 2:46 PM Ofir Gal <ofir.gal@xxxxxxxxxxx> wrote:
Xiubo/Ilya please take a look
On 6/11/24 09:36, Ofir Gal wrote:
Currently ceph_tcp_sendpage() and do_try_sendpage() use sendpage_ok() in
order to enable MSG_SPLICE_PAGES, it check the first page of the
iterator, the iterator may represent contiguous pages.
MSG_SPLICE_PAGES enables skb_splice_from_iter() which checks all the
pages it sends with sendpage_ok().
When ceph_tcp_sendpage() or do_try_sendpage() send an iterator that the
first page is sendable, but one of the other pages isn't
skb_splice_from_iter() warns and aborts the data transfer.
Using the new helper sendpages_ok() in order to enable MSG_SPLICE_PAGES
solves the issue.
Signed-off-by: Ofir Gal <ofir.gal@xxxxxxxxxxx>
---
net/ceph/messenger_v1.c | 2 +-
net/ceph/messenger_v2.c | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/net/ceph/messenger_v1.c b/net/ceph/messenger_v1.c
index 0cb61c76b9b8..a6788f284cd7 100644
--- a/net/ceph/messenger_v1.c
+++ b/net/ceph/messenger_v1.c
@@ -94,7 +94,7 @@ static int ceph_tcp_sendpage(struct socket *sock, struct page *page,
* coalescing neighboring slab objects into a single frag which
* triggers one of hardened usercopy checks.
*/
- if (sendpage_ok(page))
+ if (sendpages_ok(page, size, offset))
msg.msg_flags |= MSG_SPLICE_PAGES;
bvec_set_page(&bvec, page, size, offset);
diff --git a/net/ceph/messenger_v2.c b/net/ceph/messenger_v2.c
index bd608ffa0627..27f8f6c8eb60 100644
--- a/net/ceph/messenger_v2.c
+++ b/net/ceph/messenger_v2.c
@@ -165,7 +165,7 @@ static int do_try_sendpage(struct socket *sock, struct iov_iter *it)
* coalescing neighboring slab objects into a single frag
* which triggers one of hardened usercopy checks.
*/
- if (sendpage_ok(bv.bv_page))
+ if (sendpages_ok(bv.bv_page, bv.bv_len, bv.bv_offset))
msg.msg_flags |= MSG_SPLICE_PAGES;
else
msg.msg_flags &= ~MSG_SPLICE_PAGES;
Hi Ofir,
Ceph should be fine as is -- there is an internal "cursor" abstraction
that that is limited to PAGE_SIZE chunks, using bvec_iter_bvec() instead
of mp_bvec_iter_bvec(), etc. This means that both do_try_sendpage() and
ceph_tcp_sendpage() should be called only with
page_off + len <= PAGE_SIZE
being true even if the page is contiguous (and that we lose out on the
potential performance benefit, of course...).
That said, if the plan is to remove sendpage_ok() so that it doesn't
accidentally grow new users who are unaware of this pitfall, consider
this
Acked-by: Ilya Dryomov <idryomov@xxxxxxxxx>
From which tree should this go from? we can take it via the nvme tree,
unless
someone else wants to queue it up...