Re: [PATCH v2 0/2] fuse: add timeout option for requests

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, Aug 4, 2024 at 10:05 PM Joanne Koong <joannelkoong@xxxxxxxxx> wrote:
>
> On Sun, Aug 4, 2024 at 12:48 AM Yafang Shao <laoar.shao@xxxxxxxxx> wrote:
> >
> > On Sat, Aug 3, 2024 at 3:05 AM Joanne Koong <joannelkoong@xxxxxxxxx> wrote:
> > >
> > > On Wed, Jul 31, 2024 at 7:47 PM Yafang Shao <laoar.shao@xxxxxxxxx> wrote:
> > > >
> > > > On Thu, Aug 1, 2024 at 2:46 AM Joanne Koong <joannelkoong@xxxxxxxxx> wrote:
> > > > >
> > > > > On Wed, Jul 31, 2024 at 10:52 AM Joanne Koong <joannelkoong@xxxxxxxxx> wrote:
> > > > > >
> > > > > > On Tue, Jul 30, 2024 at 7:14 PM Yafang Shao <laoar.shao@xxxxxxxxx> wrote:
> > > > > > >
> > > > > > > On Wed, Jul 31, 2024 at 2:16 AM Joanne Koong <joannelkoong@xxxxxxxxx> wrote:
> > > > > > > >
> > > > > > > > On Mon, Jul 29, 2024 at 11:00 PM Yafang Shao <laoar.shao@xxxxxxxxx> wrote:
> > > > > > > > >
> > > > > > > > > On Tue, Jul 30, 2024 at 8:28 AM Joanne Koong <joannelkoong@xxxxxxxxx> wrote:
> > > > > > > > > >
> > > > > > > > > > There are situations where fuse servers can become unresponsive or take
> > > > > > > > > > too long to reply to a request. Currently there is no upper bound on
> > > > > > > > > > how long a request may take, which may be frustrating to users who get
> > > > > > > > > > stuck waiting for a request to complete.
> > > > > > > > > >
> > > > > > > > > > This patchset adds a timeout option for requests and two dynamically
> > > > > > > > > > configurable fuse sysctls "default_request_timeout" and "max_request_timeout"
> > > > > > > > > > for controlling/enforcing timeout behavior system-wide.
> > > > > > > > > >
> > > > > > > > > > Existing fuse servers will not be affected unless they explicitly opt into the
> > > > > > > > > > timeout.
> > > > > > > > > >
> > > > > > > > > > v1: https://lore.kernel.org/linux-fsdevel/20240717213458.1613347-1-joannelkoong@xxxxxxxxx/
> > > > > > > > > > Changes from v1:
> > > > > > > > > > - Add timeout for background requests
> > > > > > > > > > - Handle resend race condition
> > > > > > > > > > - Add sysctls
> > > > > > > > > >
> > > > > > > > > > Joanne Koong (2):
> > > > > > > > > >   fuse: add optional kernel-enforced timeout for requests
> > > > > > > > > >   fuse: add default_request_timeout and max_request_timeout sysctls
> > > > > > > > > >
> > > > > > > > > >  Documentation/admin-guide/sysctl/fs.rst |  17 +++
> > > > > > > > > >  fs/fuse/Makefile                        |   2 +-
> > > > > > > > > >  fs/fuse/dev.c                           | 187 +++++++++++++++++++++++-
> > > > > > > > > >  fs/fuse/fuse_i.h                        |  30 ++++
> > > > > > > > > >  fs/fuse/inode.c                         |  24 +++
> > > > > > > > > >  fs/fuse/sysctl.c                        |  42 ++++++
> > > > > > > > > >  6 files changed, 293 insertions(+), 9 deletions(-)
> > > > > > > > > >  create mode 100644 fs/fuse/sysctl.c
> > > > > > > > > >
> > > > > > > > > > --
> > > > > > > > > > 2.43.0
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > Hello Joanne,
> > > > > > > > >
> > > > > > > > > Thanks for your update.
> > > > > > > > >
> > > > > > > > > I have tested your patches using my test case, which is similar to the
> > > > > > > > > hello-fuse [0] example, with an additional change as follows:
> > > > > > > > >
> > > > > > > > > @@ -125,6 +125,8 @@ static int hello_read(const char *path, char *buf,
> > > > > > > > > size_t size, off_t offset,
> > > > > > > > >         } else
> > > > > > > > >                 size = 0;
> > > > > > > > >
> > > > > > > > > +       // TO trigger timeout
> > > > > > > > > +       sleep(60);
> > > > > > > > >         return size;
> > > > > > > > >  }
> > > > > > > > >
> > > > > > > > > [0] https://github.com/libfuse/libfuse/blob/master/example/hello.c
> > > > > > > > >
> > > > > > > > > However, it triggered a crash with the following setup:
> > > > > > > > >
> > > > > > > > > 1. Set FUSE timeout:
> > > > > > > > >   sysctl -w fs.fuse.default_request_timeout=10
> > > > > > > > >   sysctl -w fs.fuse.max_request_timeout = 20
> > > > > > > > >
> > > > > > > > > 2. Start FUSE daemon:
> > > > > > > > >   ./hello /tmp/fuse
> > > > > > > > >
> > > > > > > > > 3. Read from FUSE:
> > > > > > > > >   cat /tmp/fuse/hello
> > > > > > > > >
> > > > > > > > > 4. Kill the process within 10 seconds (to avoid the timeout being triggered).
> > > > > > > > >    Then the crash will be triggered.
> > > > > > > >
> > > > > > > > Hi Yafang,
> > > > > > > >
> > > > > > > > Thanks for trying this out on your use case!
> > > > > > > >
> > > > > > > > How consistently are you able to repro this?
> > > > > > >
> > > > > > > It triggers the crash every time.
> > > > > > >
> > > > > > > > I tried reproing using
> > > > > > > > your instructions above but I'm not able to get the crash.
> > > > > > >
> > > > > > > Please note that it is the `cat /tmp/fuse/hello` process that was
> > > > > > > killed, not the fuse daemon.
> > > > > > > The crash seems to occur when the fuse daemon wakes up after
> > > > > > > sleep(60). Please ensure that the fuse daemon can be woken up.
> > > > > > >
> > > > > >
> > > > > > I'm still not able to trigger the crash by killing the `cat
> > > > > > /tmp/fuse/hello` process. This is how I'm repro-ing
> > > > > >
> > > > > > 1) Add sleep to test code in
> > > > > > https://github.com/libfuse/libfuse/blob/master/example/hello.c
> > > > > > @@ -125,6 +126,9 @@ static int hello_read(const char *path, char *buf,
> > > > > > size_t size, off_t offset,
> > > > > >         } else
> > > > > >                 size = 0;
> > > > > >
> > > > > > +       sleep(60);
> > > > > > +       printf("hello_read woke up from sleep\n");
> > > > > > +
> > > > > >         return size;
> > > > > >  }
> > > > > >
> > > > > > 2)  Set fuse timeout to 10 seconds
> > > > > > sysctl -w fs.fuse.default_request_timeout=10
> > > > > >
> > > > > > 3) Start fuse daemon
> > > > > > ./example/hello ./tmp/fuse
> > > > > >
> > > > > > 4) Read from fuse
> > > > > > cat /tmp/fuse/hello
> > > > > >
> > > > > > 5) Get pid of cat process
> > > > > > top -b | grep cat
> > > > > >
> > > > > > 6) Kill cat process (within 10 seconds)
> > > > > >  sudo kill -9 <cat-pid>
> > > > > >
> > > > > > 7) Wait 60 seconds for fuse's read request to complete
> > > > > >
> > > > > > From what it sounds like, this is exactly what you are doing as well?
> > > > > >
> > > > > > I added some kernel-side logs and I'm seeing that the read request is
> > > > > > timing out after ~10 seconds and handled by the timeout handler
> > > > > > successfully.
> > > > > >
> > > > > > On the fuse daemon side, these are the logs I'm seeing from the above repro:
> > > > > > ./example/hello /tmp/fuse -f -d
> > > > > >
> > > > > > FUSE library version: 3.17.0
> > > > > > nullpath_ok: 0
> > > > > > unique: 2, opcode: INIT (26), nodeid: 0, insize: 104, pid: 0
> > > > > > INIT: 7.40
> > > > > > flags=0x73fffffb
> > > > > > max_readahead=0x00020000
> > > > > >    INIT: 7.40
> > > > > >    flags=0x4040f039
> > > > > >    max_readahead=0x00020000
> > > > > >    max_write=0x00100000
> > > > > >    max_background=0
> > > > > >    congestion_threshold=0
> > > > > >    time_gran=1
> > > > > >    unique: 2, success, outsize: 80
> > > > > > unique: 4, opcode: LOOKUP (1), nodeid: 1, insize: 46, pid: 673
> > > > > > LOOKUP /hello
> > > > > > getattr[NULL] /hello
> > > > > >    NODEID: 2
> > > > > >    unique: 4, success, outsize: 144
> > > > > > unique: 6, opcode: OPEN (14), nodeid: 2, insize: 48, pid: 673
> > > > > > open flags: 0x8000 /hello
> > > > > >    open[0] flags: 0x8000 /hello
> > > > > >    unique: 6, success, outsize: 32
> > > > > > unique: 8, opcode: READ (15), nodeid: 2, insize: 80, pid: 673
> > > > > > read[0] 4096 bytes from 0 flags: 0x8000
> > > > > > unique: 10, opcode: FLUSH (25), nodeid: 2, insize: 64, pid: 673
> > > > > >    unique: 10, error: -38 (Function not implemented), outsize: 16
> > > > > > unique: 11, opcode: INTERRUPT (36), nodeid: 0, insize: 48, pid: 0
> > > > > > FUSE_INTERRUPT: reply to kernel to disable interrupt
> > > > > >    unique: 11, error: -38 (Function not implemented), outsize: 16
> > > > > >
> > > > > > unique: 12, opcode: RELEASE (18), nodeid: 2, insize: 64, pid: 0
> > > > > >    unique: 12, success, outsize: 16
> > > > > >
> > > > > > hello_read woke up from sleep
> > > > > >    read[0] 13 bytes from 0
> > > > > >    unique: 8, success, outsize: 29
> > > > > >
> > > > > >
> > > > > > Are these the debug logs you are seeing from the daemon side as well?
> > > > > >
> > > > > > Thanks,
> > > > > > Joanne
> > > > > > > >
> > > > > > > > From the crash logs you provided below, it looks like what's happening
> > > > > > > > is that if the process gets killed, the timer isn't getting deleted.
> > > > >
> > > > > When I looked at this log previously, I thought you were repro-ing by
> > > > > killing the fuse daemon process, not the cat process. When we kill the
> > > > > cat process, the timer shouldn't be getting deleted. (if the daemon
> > > > > itself is killed, the timers get deleted)
> > > > >
> > > > > > > > I'll look more into what happens in fuse when a process is killed and
> > > > > > > > get back to you on this.
> > > > >
> > > > > This is the flow of what is happening on the kernel side (verified by
> > > > > local printks) -
> > > > >
> > > > > `cat /tmp/fuse/hello`:
> > > > > Issues a FUSE_READ background request (via fuse_send_readpages(),
> > > > > fm->fc->async_read). This request will have a timeout of 10 seconds on
> > > > > it
> > > > >
> > > > > The cat process is killed:
> > > > > This does not clean up the request. The request is still on the fpq
> > > > > processing list.
> > > > >
> > > > > Timeout on request expires:
> > > > > The timeout handler runs and properly cleans up / frees the request.
> > > > >
> > > > > Fuse daemon wakes from sleep and replies to the request:
> > > > > In dev_do_write(), the kernel won't be able to find this request
> > > > > (since it timed out and was removed from the fpq processing list) and
> > > > > return with -ENOENT
> > > >
> > > > Thank you for your explanation.
> > > > I will verify if there are any issues with my test environment.
> > > >
> > > Hi Yafang,
> > >
> > > Would you mind adding these printks to your kernel when you run the
> > > repro and pasting what they show?
> > >
> > > --- a/fs/fuse/dev.c
> > > +++ b/fs/fuse/dev.c
> > > @@ -287,6 +287,9 @@ static void do_fuse_request_end(struct fuse_req
> > > *req, bool from_timer_callback)
> > >         struct fuse_conn *fc = fm->fc;
> > >         struct fuse_iqueue *fiq = &fc->iq;
> > >
> > > +       printk("do_fuse_request_end: req=%p, from_timer=%d,
> > > req->timer.func=%d\n",
> > > +              req, from_timer_callback, req->timer.function != NULL);
> > > +
> > >         if (from_timer_callback)
> > >                 req->out.h.error = -ETIME;
> > >
> > > @@ -415,6 +418,8 @@ static void fuse_request_timeout(struct timer_list *timer)
> > >  {
> > >         struct fuse_req *req = container_of(timer, struct fuse_req, timer);
> > >
> > > +       printk("fuse_request_timeout: req=%p\n", req);
> > > +
> > >         /*
> > >          * Request reply is being finished by the kernel right now.
> > >          * No need to time out the request.
> > > @@ -612,6 +617,7 @@ ssize_t fuse_simple_request(struct fuse_mount *fm,
> > > struct fuse_args *args)
> > >
> > >         if (!args->noreply)
> > >                 __set_bit(FR_ISREPLY, &req->flags);
> > > +       printk("fuse_simple_request: req=%p, op=%u\n", req, args->opcode);
> > >         __fuse_request_send(req);
> > >         ret = req->out.h.error;
> > >         if (!ret && args->out_argvar) {
> > > @@ -673,6 +679,7 @@ int fuse_simple_background(struct fuse_mount *fm,
> > > struct fuse_args *args,
> > >
> > >         fuse_args_to_req(req, args);
> > >
> > > +       printk("fuse_background_request: req=%p, op=%u\n", req, args->opcode);
> > >         if (!fuse_request_queue_background(req)) {
> > >                 fuse_put_request(req);
> > >
> > >
> > > When I run it on my side, I see
> > >
> > > [   68.117740] fuse_background_request: req=00000000874e2f14, op=26
> > > [   68.131440] do_fuse_request_end: req=00000000874e2f14,
> > > from_timer=0, req->timer.func=1
> > > [   71.558538] fuse_simple_request: req=00000000cf643ace, op=1
> > > [   71.559651] do_fuse_request_end: req=00000000cf643ace,
> > > from_timer=0, req->timer.func=1
> > > [   71.561044] fuse_simple_request: req=00000000f2c001f0, op=14
> > > [   71.562524] do_fuse_request_end: req=00000000f2c001f0,
> > > from_timer=0, req->timer.func=1
> > > [   71.563820] fuse_background_request: req=00000000584f2cc3, op=15
> > > [   78.580035] fuse_simple_request: req=00000000ecbee970, op=25
> > > [   78.582614] do_fuse_request_end: req=00000000ecbee970,
> > > from_timer=0, req->timer.func=1
> > > [   81.624722] fuse_request_timeout: req=00000000584f2cc3
> > > [   81.625443] do_fuse_request_end: req=00000000584f2cc3,
> > > from_timer=1, req->timer.func=1
> > > [   81.626377] fuse_background_request: req=00000000b2d792ed, op=18
> > > [   81.627623] do_fuse_request_end: req=00000000b2d792ed,
> > > from_timer=0, req->timer.func=1
> > >
> > > I'm seeing only one timer get called, on the read request (opcode=15),
> > > and I'm not seeing do_fuse_request_end having been called on that
> > > request before the timer is invoked.
> > > I'm curious to compare this against the logs on your end.
> >
> > The log on my side is as follows,
> Thank you Yafang. These logs are very helpful.
>
> >
> > [  283.329421] fuse_background_request: req=000000002b4f82d4, op=26
> > [  283.330043] do_fuse_request_end: req=000000002b4f82d4,
> > from_timer=0, req->timer.func=0
> > [  287.889844] fuse_simple_request: req=00000000865e85bf, op=3
> > [  287.889914] do_fuse_request_end: req=00000000865e85bf,
> > from_timer=0, req->timer.func=0
> > [  287.889933] fuse_simple_request: req=00000000865e85bf, op=22
> > [  287.889994] do_fuse_request_end: req=00000000865e85bf,
> > from_timer=0, req->timer.func=0
> > [  287.890096] fuse_simple_request: req=00000000865e85bf, op=27
> > [  287.890130] do_fuse_request_end: req=00000000865e85bf,
> > from_timer=0, req->timer.func=0
> > [  287.890142] fuse_simple_request: req=00000000865e85bf, op=28
> > [  287.890167] do_fuse_request_end: req=00000000865e85bf,
> > from_timer=0, req->timer.func=0
> > [  287.890178] fuse_simple_request: req=00000000865e85bf, op=1
> > [  287.890191] do_fuse_request_end: req=00000000865e85bf,
> > from_timer=0, req->timer.func=0
> > [  287.890209] fuse_simple_request: req=00000000865e85bf, op=28
> > [  287.890216] do_fuse_request_end: req=00000000865e85bf,
> > from_timer=0, req->timer.func=0
> > [  287.890222] fuse_background_request: req=00000000865e85bf, op=29
> > [  287.890230] do_fuse_request_end: req=00000000865e85bf,
> > from_timer=0, req->timer.func=0
> > [  312.311752] fuse_background_request: req=00000000a8da8b44, op=26
> > [  312.312249] do_fuse_request_end: req=00000000a8da8b44,
> > from_timer=0, req->timer.func=1
> > [  317.368786] fuse_simple_request: req=00000000bc4817dd, op=1
> > [  317.368871] do_fuse_request_end: req=00000000bc4817dd,
> > from_timer=0, req->timer.func=1
> > [  317.368910] fuse_simple_request: req=00000000bc4817dd, op=14
> > [  317.368942] do_fuse_request_end: req=00000000bc4817dd,
> > from_timer=0, req->timer.func=1
> > [  317.368967] fuse_simple_request: req=00000000bc4817dd, op=15
> > [  327.855189] fuse_request_timeout: req=00000000bc4817dd
> > [  327.855195] do_fuse_request_end: req=00000000bc4817dd,
> > from_timer=1, req->timer.func=1
> > [  327.855218] fuse_simple_request: req=00000000c34cc363, op=15
> > [  327.855328] fuse_simple_request: req=00000000c34cc363, op=25
> > [  327.855401] do_fuse_request_end: req=00000000c34cc363,
> > from_timer=0, req->timer.func=1
> > [  327.855496] fuse_background_request: req=00000000c34cc363, op=18
> > [  327.855508] do_fuse_request_end: req=00000000c34cc363,
> > from_timer=0, req->timer.func=1
> > [  338.095136] Oops: general protection fault, probably for
> > non-canonical address 0xdead00000000012a: 0000 [#1] PREEMPT SMP NOPTI
> > [  338.096415] CPU: 58 PID: 0 Comm: swapper/58 Kdump: loaded Not
> > tainted 6.10.0+ #8
> > [  338.098219] RIP: 0010:__run_timers+0x27e/0x360
> > [  338.098686] Code: 07 48 c7 43 08 00 00 00 00 48 85 c0 74 78 4d 8b
> > 2f 4c 89 6b 08 0f 1f 44 00 00 49 8b 45 00 49 8b 55 08 48 89 02 48 85
> > c0 74 04 <48> 89 50 08 4d 8b 65 18 49 c7 45 08 00 00 00 00 48 b8 22 01
> > 00 00
> > [  338.100381] RSP: 0018:ffffb4ef808bced8 EFLAGS: 00010086
> > [  338.100907] RAX: dead000000000122 RBX: ffff9827ffca13c0 RCX: 0000000000000001
> > [  338.101623] RDX: ffffb4ef808bcef8 RSI: 0000000000000000 RDI: ffff9827ffca13e8
> > [  338.102333] RBP: ffffb4ef808bcf70 R08: 000000000000008b R09: ffff9827ffca1430
> > [  338.103020] R10: ffffffff93e060c0 R11: 0000000000000089 R12: 0000000000000001
> > [  338.103726] R13: ffff97e9dc06a0a0 R14: 0000000100009200 R15: ffffb4ef808bcef8
> > [  338.104439] FS:  0000000000000000(0000) GS:ffff9827ffc80000(0000)
> > knlGS:0000000000000000
> > [  338.105229] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [  338.105795] CR2: 000000c002f99340 CR3: 0000000148254001 CR4: 0000000000370ef0
> > [  338.106502] Call Trace:
> > [  338.106836]  <IRQ>
> > [  338.107175]  ? show_regs+0x69/0x80
> > [  338.107603]  ? die_addr+0x38/0x90
> > [  338.108005]  ? exc_general_protection+0x236/0x490
> > [  338.108557]  ? asm_exc_general_protection+0x27/0x30
> > [  338.109095]  ? __run_timers+0x27e/0x360
> > [  338.109563]  ? __run_timers+0x1b4/0x360
> > [  338.110009]  ? kvm_sched_clock_read+0x11/0x20
> > [  338.110528]  ? sched_clock_noinstr+0x9/0x10
> > [  338.111002]  ? sched_clock+0x10/0x30
> > [  338.111447]  ? sched_clock_cpu+0x10/0x190
> > [  338.111914]  run_timer_softirq+0x3a/0x60
> > [  338.112406]  handle_softirqs+0x118/0x350
> > [  338.112859]  irq_exit_rcu+0x60/0x80
> > [  338.113295]  sysvec_apic_timer_interrupt+0x7f/0x90
> > [  338.113823]  </IRQ>
> > [  338.114147]  <TASK>
> > [  338.114447]  asm_sysvec_apic_timer_interrupt+0x1b/0x20
> > [  338.115002] RIP: 0010:default_idle+0xb/0x20
> > [  338.115498] Code: 00 4d 29 c8 4c 01 c7 4c 29 c2 e9 6e ff ff ff 90
> > 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 eb 07 0f 00 2d b3 51 33
> > 00 fb f4 <fa> c3 cc cc cc cc 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40
> > 00 90
> > [  338.117337] RSP: 0018:ffffb4ef8028fe18 EFLAGS: 00000246
> > [  338.117894] RAX: 0000000000004000 RBX: 0000000000000001 RCX: 0001b48ebb3a1032
> > [  338.118673] RDX: 0000000000000001 RSI: ffffffff9412e060 RDI: ffff9827ffcbc8e0
> > [  338.119415] RBP: ffffb4ef8028fe20 R08: 0000004eb7fb01b4 R09: 0000000000000001
> > [  338.120151] R10: ffffffff93e56080 R11: 0000000000000001 R12: 0000000000000001
> > [  338.120872] R13: ffffffff9412e060 R14: ffffffff9412e0e0 R15: 0000000000000001
> > [  338.121615]  ? ct_kernel_exit.constprop.0+0x79/0x90
> > [  338.122171]  ? arch_cpu_idle+0x9/0x10
> > [  338.122602]  default_enter_idle+0x22/0x2f
> > [  338.123064]  cpuidle_enter_state+0x88/0x430
> > [  338.123556]  cpuidle_enter+0x34/0x50
> > [  338.123978]  call_cpuidle+0x22/0x50
> > [  338.124449]  cpuidle_idle_call+0xd2/0x120
> > [  338.124909]  do_idle+0x77/0xd0
> > [  338.125313]  cpu_startup_entry+0x2c/0x30
> > [  338.125763]  start_secondary+0x117/0x140
> > [  338.126240]  common_startup_64+0x13e/0x141
> > [  338.126711]  </TASK>
> >
> > In addition to the hello-fuse, there is another FUSE daemon, lxcfs,
> > running on my test server. After disabling lxcfs, the system no longer
> > panics, but there are still error logs:
> >
> > [  285.804534] fuse_background_request: req=0000000063502a93, op=26
> > [  285.805041] do_fuse_request_end: req=0000000063502a93,
> > from_timer=0, req->timer.func=1
> > [  290.967412] fuse_simple_request: req=000000003f362e4b, op=1
> > [  290.967480] do_fuse_request_end: req=000000003f362e4b,
> > from_timer=0, req->timer.func=1
> > [  290.967517] fuse_simple_request: req=000000003f362e4b, op=14
> > [  290.967585] do_fuse_request_end: req=000000003f362e4b,
> > from_timer=0, req->timer.func=1
> > [  290.967655] fuse_simple_request: req=000000003f362e4b, op=15
> > [  300.996023] fuse_request_timeout: req=000000003f362e4b
> > [  300.996030] do_fuse_request_end: req=000000003f362e4b,
> > from_timer=1, req->timer.func=1
> > [  300.996066] fuse_simple_request: req=00000000b4182f02, op=15
> > [  300.996180] fuse_simple_request: req=000000003f362e4b, op=25
> > [  300.996185] ==================================================================
> > [  300.996980] BUG: KFENCE: use-after-free write in enqueue_timer+0x24/0xb0
> >
> > [  300.997788] Use-after-free write at 0x0000000022312cb7 (in kfence-#156):
> > [  300.998476]  enqueue_timer+0x24/0xb0
> > [  300.998479]  __mod_timer+0x23b/0x360
> > [  300.998481]  add_timer+0x20/0x30
> > [  300.998483]  fuse_simple_request+0x1bc/0x2f0 [fuse]
> > [  300.998506]  fuse_flush+0x1ac/0x1f0 [fuse]
> > [  300.998511]  filp_flush+0x39/0x90
> > [  300.998517]  filp_close+0x15/0x30
> > [  300.998519]  put_files_struct+0x77/0xe0
> > [  300.998522]  exit_files+0x47/0x60
> > [  300.998524]  do_exit+0x262/0x480
> > [  300.998528]  do_group_exit+0x34/0x90
> > [  300.998531]  get_signal+0x92f/0x980
> > [  300.998534]  arch_do_signal_or_restart+0x2a/0x100
> > [  300.998537]  syscall_exit_to_user_mode+0xe3/0x1a0
> > [  300.998541]  do_syscall_64+0x71/0x170
> > [  300.998545]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
> >
> > [  300.998759] kfence-#156: 0x00000000b4182f02-0x0000000084fc5c46,
> > size=200, cache=ip4-frags
> >
> > [  300.998761] allocated by task 15064 on cpu 26 at 300.996061s:
> > [  300.998766]  fuse_request_alloc+0x21/0xb0 [fuse]
> > [  300.998771]  fuse_get_req+0xde/0x270 [fuse]
> > [  300.998775]  fuse_simple_request+0x33/0x2f0 [fuse]
> > [  300.998779]  fuse_do_readpage+0x15e/0x200 [fuse]
> > [  300.998783]  fuse_read_folio+0x29/0x60 [fuse]
> > [  300.998787]  filemap_read_folio+0x3b/0xe0
> > [  300.998791]  filemap_update_page+0x236/0x2d0
> > [  300.998792]  filemap_get_pages+0x225/0x390
> > [  300.998794]  filemap_read+0xed/0x3a0
> > [  300.998796]  generic_file_read_iter+0xb8/0x100
> > [  300.998798]  fuse_file_read_iter+0xd8/0x150 [fuse]
> > [  300.998804]  vfs_read+0x25e/0x340
> > [  300.998806]  ksys_read+0x67/0xf0
> > [  300.998808]  __x64_sys_read+0x19/0x20
> > [  300.998810]  x64_sys_call+0x1709/0x20b0
> > [  300.998813]  do_syscall_64+0x65/0x170
> > [  300.998815]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
> >
> > [  300.998817] freed by task 15064 on cpu 26 at 300.996084s:
> > [  300.998822]  fuse_put_request+0x89/0xf0 [fuse]
> > [  300.998826]  fuse_simple_request+0xe1/0x2f0 [fuse]
> > [  300.998830]  fuse_do_readpage+0x15e/0x200 [fuse]
> > [  300.998835]  fuse_read_folio+0x29/0x60 [fuse]
> > [  300.998839]  filemap_read_folio+0x3b/0xe0
> > [  300.998840]  filemap_update_page+0x236/0x2d0
> > [  300.998842]  filemap_get_pages+0x225/0x390
> > [  300.998844]  filemap_read+0xed/0x3a0
> > [  300.998846]  generic_file_read_iter+0xb8/0x100
> > [  300.998848]  fuse_file_read_iter+0xd8/0x150 [fuse]
> > [  300.998852]  vfs_read+0x25e/0x340
> > [  300.998854]  ksys_read+0x67/0xf0
> > [  300.998856]  __x64_sys_read+0x19/0x20
> > [  300.998857]  x64_sys_call+0x1709/0x20b0
> > [  300.998859]  do_syscall_64+0x65/0x170
> > [  300.998860]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
>
> This is very interesting. These logs (and the ones above with the
> lxcfs server running concurrently) are showing that the read request
> was freed but not through the do_fuse_request_end path. It's weird
> that fuse_simple_request reached fuse_put_request without
> do_fuse_request_end having been called (which is the only place where
> FR_FINISHED gets set and wakes up the wait events in
> request_wait_answer).
>
> I'll take a deeper look tomorrow and try to make more sense of it.

Finally realized what's happening!
When we kill the cat program, if the request hasn't been sent out to
userspace yet when the fatal signal interrupts the
wait_event_interruptible and wait_event_killable in
request_wait_answer(), this will clean up the request manually (not
through the fuse_request_end() path), which doesn't delete the timer.

I'll fix this for v3.

Thank you for surfacing this and it would be much appreciated if you
could test out v3 when it's submitted to make sure.

Thanks!
Joanne

>
> >
> > [  300.999115] CPU: 26 PID: 15064 Comm: cat Kdump: loaded Not tainted 6.10.0+ #8
> > [  301.000803] ==================================================================
> > [  301.001695] do_fuse_request_end: req=000000003f362e4b,
> > from_timer=0, req->timer.func=1
> > [  301.001723] fuse_background_request: req=000000003f362e4b, op=18
> > [  301.001767] do_fuse_request_end: req=000000003f362e4b,
> > from_timer=0, req->timer.func=1
> > [  311.235964] fuse_request_timeout: req=00000000b4182f02
> > [  311.235969] ------------[ cut here ]------------
> > [  311.235970] list_del corruption, ffff9a8072d3a000->next is
> > LIST_POISON1 (dead000000000100)
> > [  311.235982] WARNING: CPU: 26 PID: 0 at lib/list_debug.c:56
> > __list_del_entry_valid_or_report+0x8a/0xf0
> > [  311.236036] CPU: 26 PID: 0 Comm: swapper/26 Kdump: loaded Tainted:
> > G    B              6.10.0+ #8
> > [  311.236040] RIP: 0010:__list_del_entry_valid_or_report+0x8a/0xf0
> > [  311.236043] Code: 31 c0 5d c3 cc cc cc cc 48 c7 c7 60 7a 5e b0 e8
> > cc ea a4 ff 0f 0b 31 c0 5d c3 cc cc cc cc 48 c7 c7 88 7a 5e b0 e8 b6
> > ea a4 ff <0f> 0b 31 c0 5d c3 cc cc cc cc 48 89 ca 48 c7 c7 c0 7a 5e b0
> > e8 9d
> > [  311.236045] RSP: 0018:ffffb6364056ce60 EFLAGS: 00010282
> > [  311.236047] RAX: 0000000000000000 RBX: ffff9a8072d3a0a0 RCX: 0000000000000027
> > [  311.236048] RDX: ffff9a807f4a0848 RSI: 0000000000000001 RDI: ffff9a807f4a0840
> > [  311.236049] RBP: ffffb6364056ce60 R08: 0000000000000000 R09: ffffb6364056cce0
> > [  311.236050] R10: ffffb6364056ccd8 R11: ffffffffb1017ee8 R12: ffff9a8072d3a000
> > [  311.236051] R13: ffff9a420d5af000 R14: 0000000100002800 R15: ffff9a420d5af054
> > [  311.236054] FS:  0000000000000000(0000) GS:ffff9a807f480000(0000)
> > knlGS:0000000000000000
> > [  311.236056] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [  311.236057] CR2: 000000c000dc5000 CR3: 000000010cc38003 CR4: 0000000000370ef0
> > [  311.236058] Call Trace:
> > [  311.236059]  <IRQ>
> > [  311.236061]  ? show_regs+0x69/0x80
> > [  311.236065]  ? __warn+0x88/0x130
> > [  311.236068]  ? __list_del_entry_valid_or_report+0x8a/0xf0
> > [  311.236070]  ? report_bug+0x18f/0x1a0
> > [  311.236074]  ? handle_bug+0x40/0x70
> > [  311.236077]  ? exc_invalid_op+0x19/0x70
> > [  311.236079]  ? asm_exc_invalid_op+0x1b/0x20
> > [  311.236083]  ? __list_del_entry_valid_or_report+0x8a/0xf0
> > [  311.236086]  fuse_request_timeout+0x15c/0x1a0 [fuse]
> > [  311.236094]  ? __pfx_fuse_request_timeout+0x10/0x10 [fuse]
> > [  311.236099]  call_timer_fn+0x2c/0x130
> > [  311.236102]  ? __pfx_fuse_request_timeout+0x10/0x10 [fuse]
> > [  311.236106]  __run_timers+0x2c2/0x360
> > [  311.236108]  ? kvm_sched_clock_read+0x11/0x20
> > [  311.236110]  ? sched_clock_noinstr+0x9/0x10
> > [  311.236111]  ? sched_clock+0x10/0x30
> > [  311.236114]  ? sched_clock_cpu+0x10/0x190
> > [  311.236116]  run_timer_softirq+0x3a/0x60
> > [  311.236118]  handle_softirqs+0x118/0x350
> > [  311.236121]  irq_exit_rcu+0x60/0x80
> > [  311.236123]  sysvec_apic_timer_interrupt+0x7f/0x90
> > [  311.236124]  </IRQ>
> > [  311.236125]  <TASK>
> > [  311.236126]  asm_sysvec_apic_timer_interrupt+0x1b/0x20
> > [  311.236128] RIP: 0010:default_idle+0xb/0x20
> > [  311.236130] Code: 00 4d 29 c8 4c 01 c7 4c 29 c2 e9 6e ff ff ff 90
> > 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 eb 07 0f 00 2d b3 51 33
> > 00 fb f4 <fa> c3 cc cc cc cc 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40
> > 00 90
> > [  311.236131] RSP: 0018:ffffb6364018fe18 EFLAGS: 00000246
> > [  311.236133] RAX: 0000000000004000 RBX: 0000000000000001 RCX: 0001c0582ca6ada0
> > [  311.236134] RDX: 0000000000000001 RSI: ffffffffb112e060 RDI: ffff9a807f4bc8e0
> > [  311.236135] RBP: ffffb6364018fe20 R08: 00000048770ca8ec R09: 0000000000000001
> > [  311.236135] R10: ffffffffb0e56080 R11: 0000000000000001 R12: 0000000000000001
> > [  311.236136] R13: ffffffffb112e060 R14: ffffffffb112e0e0 R15: 0000000000000001
> > [  311.236138]  ? ct_kernel_exit.constprop.0+0x79/0x90
> > [  311.236140]  ? arch_cpu_idle+0x9/0x10
> > [  311.236142]  default_enter_idle+0x22/0x2f
> > [  311.236144]  cpuidle_enter_state+0x88/0x430
> > [  311.236146]  cpuidle_enter+0x34/0x50
> > [  311.236150]  call_cpuidle+0x22/0x50
> > [  311.236151]  cpuidle_idle_call+0xd2/0x120
> > [  311.236154]  do_idle+0x77/0xd0
> > [  311.236156]  cpu_startup_entry+0x2c/0x30
> > [  311.236158]  start_secondary+0x117/0x140
> > [  311.236160]  common_startup_64+0x13e/0x141
> > [  311.236163]  </TASK>
> > [  311.236163] ---[ end trace 0000000000000000 ]---
> > [  311.236165] do_fuse_request_end: req=00000000b4182f02,
> > from_timer=1, req->timer.func=1
> > [  311.236166] ------------[ cut here ]------------
> > [  311.236167] refcount_t: underflow; use-after-free.
> > [  311.236174] WARNING: CPU: 26 PID: 0 at lib/refcount.c:28
> > refcount_warn_saturate+0xc2/0x110
> > [  311.236207] CPU: 26 PID: 0 Comm: swapper/26 Kdump: loaded Tainted:
> > G    B   W          6.10.0+ #8
> > [  311.236209] RIP: 0010:refcount_warn_saturate+0xc2/0x110
> > [  311.236211] Code: 01 e8 d2 72 a6 ff 0f 0b 5d c3 cc cc cc cc 80 3d
> > 33 d4 b1 01 00 75 81 48 c7 c7 30 69 5e b0 c6 05 23 d4 b1 01 01 e8 ae
> > 72 a6 ff <0f> 0b 5d c3 cc cc cc cc 80 3d 0d d4 b1 01 00 0f 85 59 ff ff
> > ff 48
> > [  311.236212] RSP: 0018:ffffb6364056cdf8 EFLAGS: 00010286
> > [  311.236213] RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000027
> > [  311.236214] RDX: ffff9a807f4a0848 RSI: 0000000000000001 RDI: ffff9a807f4a0840
> > [  311.236215] RBP: ffffb6364056cdf8 R08: 0000000000000000 R09: ffffb6364056cc78
> > [  311.236216] R10: ffffb6364056cc70 R11: ffffffffb1017ee8 R12: ffff9a8072d3a000
> > [  311.236217] R13: ffff9a420d5af000 R14: ffff9a42426a6ec0 R15: ffff9a8072d3a010
> > [  311.236219] FS:  0000000000000000(0000) GS:ffff9a807f480000(0000)
> > knlGS:0000000000000000
> > [  311.236221] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [  311.236222] CR2: 000000c000dc5000 CR3: 000000010cc38003 CR4: 0000000000370ef0
> > [  311.236223] Call Trace:
> > [  311.236223]  <IRQ>
> > [  311.236224]  ? show_regs+0x69/0x80
> > [  311.236233]  ? __warn+0x88/0x130
> > [  311.236235]  ? refcount_warn_saturate+0xc2/0x110
> > [  311.236236]  ? report_bug+0x18f/0x1a0
> > [  311.236238]  ? handle_bug+0x40/0x70
> > [  311.236240]  ? exc_invalid_op+0x19/0x70
> > [  311.236242]  ? asm_exc_invalid_op+0x1b/0x20
> > [  311.236244]  ? refcount_warn_saturate+0xc2/0x110
> > [  311.236246]  ? refcount_warn_saturate+0xc2/0x110
> > [  311.236247]  fuse_put_request+0xc6/0xf0 [fuse]
> > [  311.236253]  do_fuse_request_end+0xcc/0x1e0 [fuse]
> > [  311.236258]  fuse_request_timeout+0xac/0x1a0 [fuse]
> > [  311.236263]  ? __pfx_fuse_request_timeout+0x10/0x10 [fuse]
> > [  311.236267]  call_timer_fn+0x2c/0x130
> > [  311.236269]  ? __pfx_fuse_request_timeout+0x10/0x10 [fuse]
> > [  311.236274]  __run_timers+0x2c2/0x360
> > [  311.236275]  ? kvm_sched_clock_read+0x11/0x20
> > [  311.236277]  ? sched_clock_noinstr+0x9/0x10
> > [  311.236278]  ? sched_clock+0x10/0x30
> > [  311.236280]  ? sched_clock_cpu+0x10/0x190
> > [  311.236281]  run_timer_softirq+0x3a/0x60
> > [  311.236283]  handle_softirqs+0x118/0x350
> > [  311.236285]  irq_exit_rcu+0x60/0x80
> > [  311.236286]  sysvec_apic_timer_interrupt+0x7f/0x90
> > [  311.236288]  </IRQ>
> > [  311.236288]  <TASK>
> > [  311.236289]  asm_sysvec_apic_timer_interrupt+0x1b/0x20
> > [  311.236291] RIP: 0010:default_idle+0xb/0x20
> > [  311.236293] Code: 00 4d 29 c8 4c 01 c7 4c 29 c2 e9 6e ff ff ff 90
> > 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 eb 07 0f 00 2d b3 51 33
> > 00 fb f4 <fa> c3 cc cc cc cc 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40
> > 00 90
> > [  311.236294] RSP: 0018:ffffb6364018fe18 EFLAGS: 00000246
> > [  311.236295] RAX: 0000000000004000 RBX: 0000000000000001 RCX: 0001c0582ca6ada0
> > [  311.236296] RDX: 0000000000000001 RSI: ffffffffb112e060 RDI: ffff9a807f4bc8e0
> > [  311.236297] RBP: ffffb6364018fe20 R08: 00000048770ca8ec R09: 0000000000000001
> > [  311.236298] R10: ffffffffb0e56080 R11: 0000000000000001 R12: 0000000000000001
> > [  311.236299] R13: ffffffffb112e060 R14: ffffffffb112e0e0 R15: 0000000000000001
> > [  311.236300]  ? ct_kernel_exit.constprop.0+0x79/0x90
> > [  311.236302]  ? arch_cpu_idle+0x9/0x10
> > [  311.236304]  default_enter_idle+0x22/0x2f
> > [  311.236306]  cpuidle_enter_state+0x88/0x430
> > [  311.236308]  cpuidle_enter+0x34/0x50
> > [  311.236310]  call_cpuidle+0x22/0x50
> > [  311.236311]  cpuidle_idle_call+0xd2/0x120
> > [  311.236313]  do_idle+0x77/0xd0
> > [  311.236315]  cpu_startup_entry+0x2c/0x30
> > [  311.236317]  start_secondary+0x117/0x140
> > [  311.236318]  common_startup_64+0x13e/0x141
> > [  311.236320]  </TASK>
> > [  311.236321] ---[ end trace 0000000000000000 ]---
> >
> > I wish I could provide you with a clear explanation of what happened
> > in my test environment, but I haven't had the time to delve into the
> > details yet.
> >
> >
> > --
> > Regards
> > Yafang





[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux