On 9/22/20 10:14 PM, Chuck Lever wrote: > >> On Sep 22, 2020, at 3:13 AM, He Zhe <zhe.he@xxxxxxxxxxxxx> wrote: >> >> >> >> On 9/21/20 3:51 AM, Chuck Lever wrote: >>> On platforms that implement flush_dcache_page(), a large NFS WRITE >>> triggers the WARN_ONCE in bvec_iter_advance(): >>> >>> Sep 20 14:01:05 klimt.1015granger.net kernel: Attempted to advance past end of bvec iter >>> Sep 20 14:01:05 klimt.1015granger.net kernel: WARNING: CPU: 0 PID: 1032 at include/linux/bvec.h:101 bvec_iter_advance.isra.0+0xa7/0x158 [sunrpc] >>> >>> Sep 20 14:01:05 klimt.1015granger.net kernel: Call Trace: >>> Sep 20 14:01:05 klimt.1015granger.net kernel: svc_tcp_recvfrom+0x60c/0x12c7 [sunrpc] >>> Sep 20 14:01:05 klimt.1015granger.net kernel: ? bvec_iter_advance.isra.0+0x158/0x158 [sunrpc] >>> Sep 20 14:01:05 klimt.1015granger.net kernel: ? del_timer_sync+0x4b/0x55 >>> Sep 20 14:01:05 klimt.1015granger.net kernel: ? test_bit+0x1d/0x27 [sunrpc] >>> Sep 20 14:01:05 klimt.1015granger.net kernel: svc_recv+0x1193/0x15e4 [sunrpc] >>> Sep 20 14:01:05 klimt.1015granger.net kernel: ? try_to_freeze.isra.0+0x6f/0x6f [sunrpc] >>> Sep 20 14:01:05 klimt.1015granger.net kernel: ? refcount_sub_and_test.constprop.0+0x13/0x40 [sunrpc] >>> Sep 20 14:01:05 klimt.1015granger.net kernel: ? svc_xprt_put+0x1e/0x29f [sunrpc] >>> Sep 20 14:01:05 klimt.1015granger.net kernel: ? svc_send+0x39f/0x3c1 [sunrpc] >>> Sep 20 14:01:05 klimt.1015granger.net kernel: nfsd+0x282/0x345 [nfsd] >>> Sep 20 14:01:05 klimt.1015granger.net kernel: ? __kthread_parkme+0x74/0xba >>> Sep 20 14:01:05 klimt.1015granger.net kernel: kthread+0x2ad/0x2bc >>> Sep 20 14:01:05 klimt.1015granger.net kernel: ? nfsd_destroy+0x124/0x124 [nfsd] >>> Sep 20 14:01:05 klimt.1015granger.net kernel: ? test_bit+0x1d/0x27 >>> Sep 20 14:01:05 klimt.1015granger.net kernel: ? kthread_mod_delayed_work+0x115/0x115 >>> Sep 20 14:01:05 klimt.1015granger.net kernel: ret_from_fork+0x22/0x30 >>> >>> Reported-by: He Zhe <zhe.he@xxxxxxxxxxxxx> >>> Fixes: ca07eda33e01 ("SUNRPC: Refactor svc_recvfrom()") >>> Signed-off-by: Chuck Lever <chuck.lever@xxxxxxxxxx> >>> --- >>> net/sunrpc/svcsock.c | 2 +- >>> 1 file changed, 1 insertion(+), 1 deletion(-) >>> >>> Hi Zhe- >>> >>> If you confirm this fixes your issue and there are no other >>> objections or regressions, I can submit this for v5.9-rc. >> I don't quite get why we add "seek" to "size". It seems this action does not >> reflect the actual scenario and forcedly neutralizes the WARN_ONCE check in >> bvec_iter_advance, so that it may "advance past end of bvec iter" and thus >> introduces overflow. >> Why don't we avoid this problem at the very begginning like my v1? That is, call >> svc_flush_bvec only when we have received more than we want to seek. >> >> len = sock_recvmsg(svsk->sk_sock, &msg, MSG_DONTWAIT); >> - if (len > 0) >> + if (len > 0 && (size_t)len > (seek & PAGE_MASK)) >> svc_flush_bvec(bvec, len, seek); > Because this doesn't fix the underlying bug that triggered the > WARN_ONCE. > > svc_tcp_recvfrom() attempts to assemble a possibly large RPC Call > from a sequence of sock_recvmsg's. > > @seek is the running number of bytes that has been received so > far for the RPC Call we are assembling. @size is the number of > bytes that was just received in the most recent sock_recvmsg. > > We want svc_flush_bvec to flush just the area of @bvec that > hasn't been flushed yet. > > Thus: the current size of the partial Call message in @bvec is > @seek + @size. The starting location of the flush is > @seek & PAGE_MASK. This aligns the flush so it starts on a page > boundary. > > This: > > 230 struct bvec_iter bi = { > 231 .bi_size = size + seek, > 232 }; > > 235 bvec_iter_advance(bvec, &bi, seek & PAGE_MASK); > > advances the bvec_iter to the part of @bvec that hasn't been > flushed yet. > > This loop: > > 236 for_each_bvec(bv, bvec, bi, bi) > 237 flush_dcache_page(bv.bv_page); > > flushes each page starting at that point to the end of the bytes > that have been received so far > > In other words, ca07eda33e01 was wrong because it always flushed > the first section of @bvec, never the later parts of it. Thanks for clarification. I just tested the patch. It works well. Zhe > > >> Regards, >> Zhe >> >>> >>> diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c >>> index d5805fa1d066..c2752e2b9ce3 100644 >>> --- a/net/sunrpc/svcsock.c >>> +++ b/net/sunrpc/svcsock.c >>> @@ -228,7 +228,7 @@ static int svc_one_sock_name(struct svc_sock *svsk, char *buf, int remaining) >>> static void svc_flush_bvec(const struct bio_vec *bvec, size_t size, size_t seek) >>> { >>> struct bvec_iter bi = { >>> - .bi_size = size, >>> + .bi_size = size + seek, >>> }; >>> struct bio_vec bv; > -- > Chuck Lever > > >