> On Sep 22, 2020, at 3:13 AM, He Zhe <zhe.he@xxxxxxxxxxxxx> wrote: > > > > On 9/21/20 3:51 AM, Chuck Lever wrote: >> On platforms that implement flush_dcache_page(), a large NFS WRITE >> triggers the WARN_ONCE in bvec_iter_advance(): >> >> Sep 20 14:01:05 klimt.1015granger.net kernel: Attempted to advance past end of bvec iter >> Sep 20 14:01:05 klimt.1015granger.net kernel: WARNING: CPU: 0 PID: 1032 at include/linux/bvec.h:101 bvec_iter_advance.isra.0+0xa7/0x158 [sunrpc] >> >> Sep 20 14:01:05 klimt.1015granger.net kernel: Call Trace: >> Sep 20 14:01:05 klimt.1015granger.net kernel: svc_tcp_recvfrom+0x60c/0x12c7 [sunrpc] >> Sep 20 14:01:05 klimt.1015granger.net kernel: ? bvec_iter_advance.isra.0+0x158/0x158 [sunrpc] >> Sep 20 14:01:05 klimt.1015granger.net kernel: ? del_timer_sync+0x4b/0x55 >> Sep 20 14:01:05 klimt.1015granger.net kernel: ? test_bit+0x1d/0x27 [sunrpc] >> Sep 20 14:01:05 klimt.1015granger.net kernel: svc_recv+0x1193/0x15e4 [sunrpc] >> Sep 20 14:01:05 klimt.1015granger.net kernel: ? try_to_freeze.isra.0+0x6f/0x6f [sunrpc] >> Sep 20 14:01:05 klimt.1015granger.net kernel: ? refcount_sub_and_test.constprop.0+0x13/0x40 [sunrpc] >> Sep 20 14:01:05 klimt.1015granger.net kernel: ? svc_xprt_put+0x1e/0x29f [sunrpc] >> Sep 20 14:01:05 klimt.1015granger.net kernel: ? svc_send+0x39f/0x3c1 [sunrpc] >> Sep 20 14:01:05 klimt.1015granger.net kernel: nfsd+0x282/0x345 [nfsd] >> Sep 20 14:01:05 klimt.1015granger.net kernel: ? __kthread_parkme+0x74/0xba >> Sep 20 14:01:05 klimt.1015granger.net kernel: kthread+0x2ad/0x2bc >> Sep 20 14:01:05 klimt.1015granger.net kernel: ? nfsd_destroy+0x124/0x124 [nfsd] >> Sep 20 14:01:05 klimt.1015granger.net kernel: ? test_bit+0x1d/0x27 >> Sep 20 14:01:05 klimt.1015granger.net kernel: ? kthread_mod_delayed_work+0x115/0x115 >> Sep 20 14:01:05 klimt.1015granger.net kernel: ret_from_fork+0x22/0x30 >> >> Reported-by: He Zhe <zhe.he@xxxxxxxxxxxxx> >> Fixes: ca07eda33e01 ("SUNRPC: Refactor svc_recvfrom()") >> Signed-off-by: Chuck Lever <chuck.lever@xxxxxxxxxx> >> --- >> net/sunrpc/svcsock.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> Hi Zhe- >> >> If you confirm this fixes your issue and there are no other >> objections or regressions, I can submit this for v5.9-rc. > > I don't quite get why we add "seek" to "size". It seems this action does not > reflect the actual scenario and forcedly neutralizes the WARN_ONCE check in > bvec_iter_advance, so that it may "advance past end of bvec iter" and thus > introduces overflow. > Why don't we avoid this problem at the very begginning like my v1? That is, call > svc_flush_bvec only when we have received more than we want to seek. > > len = sock_recvmsg(svsk->sk_sock, &msg, MSG_DONTWAIT); > - if (len > 0) > + if (len > 0 && (size_t)len > (seek & PAGE_MASK)) > svc_flush_bvec(bvec, len, seek); Because this doesn't fix the underlying bug that triggered the WARN_ONCE. svc_tcp_recvfrom() attempts to assemble a possibly large RPC Call from a sequence of sock_recvmsg's. @seek is the running number of bytes that has been received so far for the RPC Call we are assembling. @size is the number of bytes that was just received in the most recent sock_recvmsg. We want svc_flush_bvec to flush just the area of @bvec that hasn't been flushed yet. Thus: the current size of the partial Call message in @bvec is @seek + @size. The starting location of the flush is @seek & PAGE_MASK. This aligns the flush so it starts on a page boundary. This: 230 struct bvec_iter bi = { 231 .bi_size = size + seek, 232 }; 235 bvec_iter_advance(bvec, &bi, seek & PAGE_MASK); advances the bvec_iter to the part of @bvec that hasn't been flushed yet. This loop: 236 for_each_bvec(bv, bvec, bi, bi) 237 flush_dcache_page(bv.bv_page); flushes each page starting at that point to the end of the bytes that have been received so far In other words, ca07eda33e01 was wrong because it always flushed the first section of @bvec, never the later parts of it. > Regards, > Zhe > >> >> >> diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c >> index d5805fa1d066..c2752e2b9ce3 100644 >> --- a/net/sunrpc/svcsock.c >> +++ b/net/sunrpc/svcsock.c >> @@ -228,7 +228,7 @@ static int svc_one_sock_name(struct svc_sock *svsk, char *buf, int remaining) >> static void svc_flush_bvec(const struct bio_vec *bvec, size_t size, size_t seek) >> { >> struct bvec_iter bi = { >> - .bi_size = size, >> + .bi_size = size + seek, >> }; >> struct bio_vec bv; -- Chuck Lever