Sure: here’s a backtrace and req->args printed: (gdb) bt full #0 0xffffffff82000ff3 in __x86_indirect_thunk_r8 () at arch/x86/lib/retpoline.S:40 No locals. #1 0xffffffff8137ec68 in fuse_request_end (fc=0xffff88813a57be00, req=0xffff88813a5a3770) at fs/fuse/dev.c:329 fiq = 0xffff88813a57be48 async = true #2 0xffffffff81382d88 in fuse_dev_do_write (fud=0xffff88813a57be00, cs=0xffffc90002fafa00, nbytes=4294967294) at fs/fuse/dev.c:1912 err = 0 fc = 0xffff88813a57be00 fpq = 0xffff888132815f48 req = 0xffff88813a5a3770 oh = { len = 16, error = -2, unique = 66942 } #3 0xffffffff81382e69 in fuse_dev_write (iocb=0xffffc900008cbe48, from=0xffffc900008cbe20) at fs/fuse/dev.c:1934 cs = { write = 0, req = 0xffff88813a5a3770, iter = 0xffffc900008cbe20, pipebufs = 0x0 <fixed_percpu_data>, currbuf = 0x0 <fixed_percpu_data>, pipe = 0x0 <fixed_percpu_data>, nr_segs = 0, pg = 0x0 <fixed_percpu_data>, len = 0, offset = 24, move_pages = 0 } fud = 0xffff888132815f40 #4 0xffffffff8120122e in call_write_iter (file=<optimized out>, iter=<optimized out>, kio=<optimized out>) at ./include/linux/fs.h:1901 No locals. #5 new_sync_write (filp=0xffff888119886b00, buf=<optimized out>, len=<optimized out>, ppos=0xffffc900008cbee8) at fs/read_write.c:483 iov = { iov_base = 0xc0008ec008, iov_len = 16 } kiocb = { ki_filp = 0xffff888119886b00, ki_pos = 0, ki_complete = 0x0 <fixed_percpu_data>, private = 0x0 <fixed_percpu_data>, ki_flags = 0, ki_hint = 0, ki_ioprio = 0, ki_cookie = 0 } iter = { type = 5, iov_offset = 0, count = 0, { iov = 0xffffc900008cbe20, kvec = 0xffffc900008cbe20, bvec = 0xffffc900008cbe20, pipe = 0xffffc900008cbe20 }, { nr_segs = 0, { head = 0, start_head = 0 } } } ret = <optimized out> #6 0xffffffff812012e4 in __vfs_write (file=<optimized out>, p=<optimized out>, count=<optimized out>, pos=<optimized out>) at fs/read_write.c:496 No locals. #7 0xffffffff81203f04 in vfs_write (pos=<optimized out>, count=16, buf=<optimized out>, file=<optimized out>) at fs/read_write.c:558 ret = 16 ret = <optimized out> #8 vfs_write (file=0xffff888119886b00, buf=0xc0008ec008 "\020", count=16, pos=0xffffc900008cbee8) at fs/read_write.c:542 ret = 16 #9 0xffffffff812041b2 in ksys_write (fd=<optimized out>, buf=0xc0008ec008 "\020", count=16) at fs/read_write.c:611 pos = 0 ppos = <optimized out> f = <optimized out> ret = 824643076104 #10 0xffffffff81204245 in __do_sys_write (count=<optimized out>, buf=<optimized out>, fd=<optimized out>) at fs/read_write.c:623 No locals. #11 __se_sys_write (count=<optimized out>, buf=<optimized out>, fd=<optimized out>) at fs/read_write.c:620 ret = <optimized out> ret = <optimized out> #12 __x64_sys_write (regs=<optimized out>) at fs/read_write.c:620 No locals. #13 0xffffffff810028a8 in do_syscall_64 (nr=<optimized out>, regs=0xffffc900008cbf58) at arch/x86/entry/common.c:294 ti = <optimized out> #14 0xffffffff81e0007c in entry_SYSCALL_64 () at arch/x86/entry/entry_64.S:175 No locals. #15 0x0000000000000000 in ?? () No symbol table info available. (gdb) p *req->args $5 = { nodeid = 18446683600620026424, opcode = 2167928246, in_numargs = 65535, out_numargs = 65535, force = false, noreply = false, nocreds = false, in_pages = false, out_pages = false, out_argvar = true, page_zeroing = true, page_replace = false, in_args = {{ size = 978828800, value = 0x2fafce0 }, { size = 978992728, value = 0xffffffff8138efaa <fuse_alloc_forget+26> }, { size = 50002688, value = 0xffffffff8138635f <fuse_lookup_name+255> }}, out_args = {{ size = 570, value = 0xffffc90002fafb10 }, { size = 6876, value = 0x3000000001adc }}, end = 0x1000100000001 } Independently, as a separate test, I have also modified the source like this: bool async; bool async_early = req->args->end; if (test_and_set_bit(FR_FINISHED, &req->flags)) goto put_request; async = req->args->end; …and printed the value of async and async_early. async is true, async_early is false. Perhaps some other routine is modifying the request, and checking req->args->end early enough masks that bug? On Tue, Feb 11, 2020 at 11:55 AM Miklos Szeredi <miklos@xxxxxxxxxx> wrote: > > On Sun, Feb 9, 2020 at 9:09 AM <michael+lkml@xxxxxxxxxxxxx> wrote: > > > > From: Michael Stapelberg <michael+lkml@xxxxxxxxxxxxx> > > > > Hey, > > > > I recently ran into this, too. The symptom for me is that processes using the > > affected FUSE file system hang indefinitely, sync(2) system calls hang > > indefinitely, and even triggering an abort via echo 1 > > > /sys/fs/fuse/connections/*/abort does not get the file system unstuck (there is > > always 1 request still pending). Only removing power will get the machine > > unstuck. > > > > I’m triggering this when building packages for https://distr1.org/, which uses a > > FUSE daemon (written in Go using the jacobsa/fuse package) to provide package > > contents. > > > > I bisected the issue to commit > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=2b319d1f6f92a4ced9897678113d176ee16ae85d > > > > With that commit, I run into a kernel oops within ≈1 minute after starting my > > batch build. With the commit before, I can batch build for many minutes without > > issues. > > Pretty weird. I'm not seeing how this could change behavior, as the > args->end value is not changed after being initialized, and so moving > the test later should not make a difference. > > Could you print out the complete contents of req->args? > > Thanks, > Miklos