On Fri, Jan 24, 2025 at 2:00 PM Ihor Solodrai <ihor.solodrai@xxxxx> wrote: > > On Monday, December 16th, 2024 at 12:41 PM, David Howells <dhowells@xxxxxxxxxx> wrote: > > > Change the way netfslib collects read results to do all the collection for > > a particular read request using a single work item that walks along the > > subrequest queue as subrequests make progress or complete, unlocking folios > > progressively rather than doing the unlock in parallel as parallel requests > > come in. > > > > The code is remodelled to be more like the write-side code, though only > > using a single stream. This makes it more directly comparable and thus > > easier to duplicate fixes between the two sides. > > > > This has a number of advantages: > > > > (1) It's simpler. There doesn't need to be a complex donation mechanism > > to handle mismatches between the size and alignment of subrequests and > > folios. The collector unlocks folios as the subrequests covering each > > complete. > > > > (2) It should cause less scheduler overhead as there's a single work item > > in play unlocking pages in parallel when a read gets split up into a > > lot of subrequests instead of one per subrequest. > > > > Whilst the parallellism is nice in theory, in practice, the vast > > majority of loads are sequential reads of the whole file, so > > committing a bunch of threads to unlocking folios out of order doesn't > > help in those cases. > > > > (3) It should make it easier to implement content decryption. A folio > > cannot be decrypted until all the requests that contribute to it have > > completed - and, again, most loads are sequential and so, most of the > > time, we want to begin decryption sequentially (though it's great if > > the decryption can happen in parallel). > > > > There is a disadvantage in that we're losing the ability to decrypt and > > unlock things on an as-things-arrive basis which may affect some > > applications. > > > > Signed-off-by: David Howells dhowells@xxxxxxxxxx > > > > cc: Jeff Layton jlayton@xxxxxxxxxx > > > > cc: netfs@xxxxxxxxxxxxxxx > > cc: linux-fsdevel@xxxxxxxxxxxxxxx > > --- > > fs/9p/vfs_addr.c | 3 +- > > fs/afs/dir.c | 8 +- > > fs/ceph/addr.c | 9 +- > > fs/netfs/buffered_read.c | 160 ++++---- > > fs/netfs/direct_read.c | 60 +-- > > fs/netfs/internal.h | 21 +- > > fs/netfs/main.c | 2 +- > > fs/netfs/objects.c | 34 +- > > fs/netfs/read_collect.c | 716 ++++++++++++++++++++--------------- > > fs/netfs/read_pgpriv2.c | 203 ++++------ > > fs/netfs/read_retry.c | 207 +++++----- > > fs/netfs/read_single.c | 37 +- > > fs/netfs/write_collect.c | 4 +- > > fs/netfs/write_issue.c | 2 +- > > fs/netfs/write_retry.c | 14 +- > > fs/smb/client/cifssmb.c | 2 + > > fs/smb/client/smb2pdu.c | 5 +- > > include/linux/netfs.h | 16 +- > > include/trace/events/netfs.h | 79 +--- > > 19 files changed, 819 insertions(+), 763 deletions(-) > > Hello David. > > After recent merge from upstream BPF CI started consistently failing > with a task hanging in v9fs_evict_inode. I bisected the failure to > commit e2d46f2ec332, pointing to this patch. > > Reverting the patch seems to have helped: > https://github.com/kernel-patches/vmtest/actions/runs/12952856569 > > Could you please investigate? > > Examples of failed jobs: > * https://github.com/kernel-patches/bpf/actions/runs/12941732247 > * https://github.com/kernel-patches/bpf/actions/runs/12933849075 > > A log snippet: > > 2025-01-24T02:15:03.9009694Z [ 246.932163] INFO: task ip:1055 blocked for more than 122 seconds. > 2025-01-24T02:15:03.9013633Z [ 246.932709] Tainted: G OE 6.13.0-g2bcb9cf535b8-dirty #149 > 2025-01-24T02:15:03.9018791Z [ 246.933249] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > 2025-01-24T02:15:03.9025896Z [ 246.933802] task:ip state:D stack:0 pid:1055 tgid:1055 ppid:1054 flags:0x00004002 > 2025-01-24T02:15:03.9028228Z [ 246.934564] Call Trace: > 2025-01-24T02:15:03.9029758Z [ 246.934764] <TASK> > 2025-01-24T02:15:03.9032572Z [ 246.934937] __schedule+0xa91/0xe80 > 2025-01-24T02:15:03.9035126Z [ 246.935224] schedule+0x41/0xb0 > 2025-01-24T02:15:03.9037992Z [ 246.935459] v9fs_evict_inode+0xfe/0x170 > 2025-01-24T02:15:03.9041469Z [ 246.935748] ? __pfx_var_wake_function+0x10/0x10 > 2025-01-24T02:15:03.9043837Z [ 246.936101] evict+0x1ef/0x360 > 2025-01-24T02:15:03.9046624Z [ 246.936340] __dentry_kill+0xb0/0x220 > 2025-01-24T02:15:03.9048855Z [ 246.936610] ? dput+0x3a/0x1d0 > 2025-01-24T02:15:03.9051128Z [ 246.936838] dput+0x114/0x1d0 > 2025-01-24T02:15:03.9053548Z [ 246.937069] __fput+0x136/0x2b0 > 2025-01-24T02:15:03.9056154Z [ 246.937305] task_work_run+0x89/0xc0 > 2025-01-24T02:15:03.9058593Z [ 246.937571] do_exit+0x2c6/0x9c0 > 2025-01-24T02:15:03.9061349Z [ 246.937816] do_group_exit+0xa4/0xb0 > 2025-01-24T02:15:03.9064401Z [ 246.938090] __x64_sys_exit_group+0x17/0x20 > 2025-01-24T02:15:03.9067235Z [ 246.938390] x64_sys_call+0x21a0/0x21a0 > 2025-01-24T02:15:03.9069924Z [ 246.938672] do_syscall_64+0x79/0x120 > 2025-01-24T02:15:03.9072746Z [ 246.938941] ? clear_bhb_loop+0x25/0x80 > 2025-01-24T02:15:03.9075581Z [ 246.939230] ? clear_bhb_loop+0x25/0x80 > 2025-01-24T02:15:03.9079275Z [ 246.939510] entry_SYSCALL_64_after_hwframe+0x76/0x7e > 2025-01-24T02:15:03.9081976Z [ 246.939875] RIP: 0033:0x7fb86f66f21d > 2025-01-24T02:15:03.9087533Z [ 246.940153] RSP: 002b:00007ffdb3cf93f8 EFLAGS: 00000202 ORIG_RAX: 00000000000000e7 > 2025-01-24T02:15:03.9092590Z [ 246.940689] RAX: ffffffffffffffda RBX: 00007fb86f785fa8 RCX: 00007fb86f66f21d > 2025-01-24T02:15:03.9097722Z [ 246.941201] RDX: 00000000000000e7 RSI: ffffffffffffff80 RDI: 0000000000000000 > 2025-01-24T02:15:03.9102762Z [ 246.941705] RBP: 00007ffdb3cf9450 R08: 00007ffdb3cf93a0 R09: 0000000000000000 > 2025-01-24T02:15:03.9107940Z [ 246.942215] R10: 00007ffdb3cf92ff R11: 0000000000000202 R12: 0000000000000001 > 2025-01-24T02:15:03.9113002Z [ 246.942723] R13: 0000000000000000 R14: 0000000000000000 R15: 00007fb86f785fc0 > 2025-01-24T02:15:03.9114614Z [ 246.943244] </TASK> That looks very similar to something I saw in afs testing, with a similar stack but in afs_evict_inode where it hung waiting in netfs_wait_for_outstanding_io. David pointed to this bit where there's a double get in netfs_retry_read_subrequests, since netfs_reissue_read already takes care of getting a ref on the subrequest: diff --git a/fs/netfs/read_retry.c b/fs/netfs/read_retry.c index 2290af0d51ac..53d62e31a4cc 100644 --- a/fs/netfs/read_retry.c +++ b/fs/netfs/read_retry.c @@ -152,7 +152,6 @@ static void netfs_retry_read_subrequests(struct netfs_io_request *rreq) __clear_bit(NETFS_SREQ_BOUNDARY, &subreq->flags); } - netfs_get_subrequest(subreq, netfs_sreq_trace_get_resubmit); netfs_reissue_read(rreq, subreq); if (subreq == to) break; That seems to help for my afs test case, I suspect it might help in your case as well. Marc