Re: [PATCH RESEND v9 2/3] fuse: add optional kernel-enforced timeout for requests

Tomasz Figa <tfiga@xxxxxxxxxxxx> · Wed, 4 Dec 2024 23:40:32 +0900

On Tue, Dec 3, 2024 at 4:29 AM Joanne Koong <joannelkoong@xxxxxxxxx> wrote:
>
> On Mon, Dec 2, 2024 at 6:43 AM Bernd Schubert
> <bernd.schubert@xxxxxxxxxxx> wrote:
> >
> > On 12/2/24 10:45, Tomasz Figa wrote:
> > > Hi everyone,
> > >
> > > On Thu, Nov 28, 2024 at 8:55 PM Sergey Senozhatsky
> > > <senozhatsky@xxxxxxxxxxxx> wrote:
> > >>
> > >> Cc-ing Tomasz
> > >>
> > >> On (24/11/28 11:23), Bernd Schubert wrote:
> > >>>> Thanks for the pointers again, Bernd.
> > >>>>
> > >>>>> Miklos had asked for to abort the connection in v4
> > >>>>> https://lore.kernel.org/all/CAJfpegsiRNnJx7OAoH58XRq3zujrcXx94S2JACFdgJJ_b8FdHw@xxxxxxxxxxxxxx/raw
> > >>>>
> > >>>> OK, sounds reasonable. I'll try to give the series some testing in the
> > >>>> coming days.
> > >>>>
> > >>>> // I still would probably prefer "seconds" timeout granularity.
> > >>>> // Unless this also has been discussed already and Bernd has a link ;)
> > >>>
> > >>>
> > >>> The issue is that is currently iterating through 256 hash lists +
> > >>> pending + bg.
> > >>>
> > >>> https://lore.kernel.org/all/CAJnrk1b7bfAWWq_pFP=4XH3ddc_9GtAM2mE7EgWnx2Od+UUUjQ@xxxxxxxxxxxxxx/raw
> > >>
> > >> Oh, I see.
> > >>
> > >>> Personally I would prefer a second list to avoid the check spike and latency
> > >>> https://lore.kernel.org/linux-fsdevel/9ba4eaf4-b9f0-483f-90e5-9512aded419e@xxxxxxxxxxx/raw
> > >>
> > >> That's good to know.  I like the idea of less CPU usage in general,
> > >> our devices a battery powered so everything counts, to some extent.
> > >>
> > >>> What is your opinion about that? I guess android and chromium have an
> > >>> interest low latencies and avoiding cpu spikes?
> > >>
> > >> Good question.
> > >>
> > >> Can't speak for android, in chromeos we probably will keep it at 1 minute,
> > >> but this is because our DEFAULT_HUNG_TASK_TIMEOUT is larger than that (we
> > >> use default value of 120 sec). There are setups that might use lower
> > >> values, or even re-define default value, e.g.:
> > >>
> > >> arch/arc/configs/axs101_defconfig:CONFIG_DEFAULT_HUNG_TASK_TIMEOUT=10
> > >> arch/arc/configs/axs103_defconfig:CONFIG_DEFAULT_HUNG_TASK_TIMEOUT=10
> > >> arch/arc/configs/axs103_smp_defconfig:CONFIG_DEFAULT_HUNG_TASK_TIMEOUT=10
> > >> arch/arc/configs/hsdk_defconfig:CONFIG_DEFAULT_HUNG_TASK_TIMEOUT=10
> > >> arch/arc/configs/vdk_hs38_defconfig:CONFIG_DEFAULT_HUNG_TASK_TIMEOUT=10
> > >> arch/arc/configs/vdk_hs38_smp_defconfig:CONFIG_DEFAULT_HUNG_TASK_TIMEOUT=10
> > >> arch/powerpc/configs/mvme5100_defconfig:CONFIG_DEFAULT_HUNG_TASK_TIMEOUT=20
> > >>
> > >> In those cases 1 minute fuse timeout will overshot HUNG_TASK_TIMEOUT
> > >> and then the question is whether HUNG_TASK_PANIC is set.
> > >>
> > >> On the other hand, setups that set much lower timeout than
> > >> DEFAULT_HUNG_TASK_TIMEOUT=120 will have extra CPU activities regardless,
> > >> just because watchdogs will run more often.
> > >>
> > >> Tomasz, any opinions?
> > >
> > > First of all, thanks everyone for looking into this.
>
> Hi Sergey and Tomasz,
>
> Sorry for the late reply - I was out the last couple of days. Thanks
> Bernd for weighing in and answering the questions!
>
> > >
> > > How about keeping a list of requests in the FIFO order (in other
> > > words: first entry is the first to timeout) and whenever the first
> > > entry is being removed from the list (aka the request actually
> > > completes), re-arming the timer to the timeout of the next request in
> > > the list? This way we don't really have any timer firing unless there
> > > is really a request that timed out.
>
> I think the issue with this is that we likely would end up wasting
> more cpu cycles. For a busy FUSE server, there could be hundreds
> (thousands?) of requests that happen within the span of
> FUSE_TIMEOUT_TIMER_FREQ seconds.
>
> While working on the patch, one thing I considered was disarming the
> timer in the timeout handler fuse_check_timeout() if no requests are
> on the list, in order to accomodate for "quiet periods" (eg if the
> FUSE server is inactive for a few minutes or hours) but ultimately
> decided against it because of the overhead it'd incur per request (eg
> check if the timer is disarmed, would most likely need to grab the
> fc->lock as well since timer rearming would need to be synchronized
> between background and non-background requests, etc.).
>
> All in all, imo I don't think having the timer trigger every 60
> seconds (what FUSE_TIMEOUT_TIMER_FREQ is set to) is too costly.
>
> >
> > Requests are in FIFO order on the list and only head is checked.
> > There are 256 hash lists per fuse device for requests currently
> > in user space, though.
> >
> > >
> > > (In fact, we could optimize it even further by opportunistically
> > > scheduling a timer slightly later and opportunistically handling timed
> > > out requests when other requests are being completed, but this would
> > > be optimizing for the slow path, so probably an overkill.)
> > >
> > > As for the length of the request timeout vs the hung task watchdog
> > > timeout, my opinion is that we should make sure that the hung task
> > > watchdog doesn't hit in any case, simply because a misbehaving
> > > userspace process must not be able to panic the kernel. In the
> > > blk-core, the blk_io_schedule() function [1] uses
> > > sysctl_hung_task_timeout_secs to determine the maximum length of a
> > > single uninterruptible sleep. I suppose we could use the same
> > > calculation to obtain our timeout number. What does everyone think?
> > >
> > > [1] https://elixir.bootlin.com/linux/v6.12.1/source/block/blk-core.c#L1232
> >
> > I think that is a good idea.
>
> Btw, just something to note, the fuse request timeout has an upper
> margin of error associated with it.
>
> Copying over from the commit message -
>
> "Please note that these timeouts are not 100% precise. The request may
> take an extra FUSE_TIMEOUT_TIMER_FREQ seconds beyond the requested max
> timeout due to how it's internally implemented."
>
> For example, if a server sets the request timeout config to 10
> minutes, the server could be aborted after 11 minutes
> (FUSE_TIMEOUT_TIMER_FREQ is set to 60 seconds internally) instead of
> 10 minutes.
>

Let me add +Brian Geffon who also was thinking about the right timeout value.

>
> Thanks,
> Joanne
> >
> >
> > Thanks,
> > Bernd