Re: [PATCH RESEND v9 2/3] fuse: add optional kernel-enforced timeout for requests

Tomasz Figa <tfiga@xxxxxxxxxxxx> · Mon, 2 Dec 2024 18:45:10 +0900

Hi everyone,

On Thu, Nov 28, 2024 at 8:55 PM Sergey Senozhatsky
<senozhatsky@xxxxxxxxxxxx> wrote:
>
> Cc-ing Tomasz
>
> On (24/11/28 11:23), Bernd Schubert wrote:
> > > Thanks for the pointers again, Bernd.
> > >
> > >> Miklos had asked for to abort the connection in v4
> > >> https://lore.kernel.org/all/CAJfpegsiRNnJx7OAoH58XRq3zujrcXx94S2JACFdgJJ_b8FdHw@xxxxxxxxxxxxxx/raw
> > >
> > > OK, sounds reasonable. I'll try to give the series some testing in the
> > > coming days.
> > >
> > > // I still would probably prefer "seconds" timeout granularity.
> > > // Unless this also has been discussed already and Bernd has a link ;)
> >
> >
> > The issue is that is currently iterating through 256 hash lists +
> > pending + bg.
> >
> > https://lore.kernel.org/all/CAJnrk1b7bfAWWq_pFP=4XH3ddc_9GtAM2mE7EgWnx2Od+UUUjQ@xxxxxxxxxxxxxx/raw
>
> Oh, I see.
>
> > Personally I would prefer a second list to avoid the check spike and latency
> > https://lore.kernel.org/linux-fsdevel/9ba4eaf4-b9f0-483f-90e5-9512aded419e@xxxxxxxxxxx/raw
>
> That's good to know.  I like the idea of less CPU usage in general,
> our devices a battery powered so everything counts, to some extent.
>
> > What is your opinion about that? I guess android and chromium have an
> > interest low latencies and avoiding cpu spikes?
>
> Good question.
>
> Can't speak for android, in chromeos we probably will keep it at 1 minute,
> but this is because our DEFAULT_HUNG_TASK_TIMEOUT is larger than that (we
> use default value of 120 sec). There are setups that might use lower
> values, or even re-define default value, e.g.:
>
> arch/arc/configs/axs101_defconfig:CONFIG_DEFAULT_HUNG_TASK_TIMEOUT=10
> arch/arc/configs/axs103_defconfig:CONFIG_DEFAULT_HUNG_TASK_TIMEOUT=10
> arch/arc/configs/axs103_smp_defconfig:CONFIG_DEFAULT_HUNG_TASK_TIMEOUT=10
> arch/arc/configs/hsdk_defconfig:CONFIG_DEFAULT_HUNG_TASK_TIMEOUT=10
> arch/arc/configs/vdk_hs38_defconfig:CONFIG_DEFAULT_HUNG_TASK_TIMEOUT=10
> arch/arc/configs/vdk_hs38_smp_defconfig:CONFIG_DEFAULT_HUNG_TASK_TIMEOUT=10
> arch/powerpc/configs/mvme5100_defconfig:CONFIG_DEFAULT_HUNG_TASK_TIMEOUT=20
>
> In those cases 1 minute fuse timeout will overshot HUNG_TASK_TIMEOUT
> and then the question is whether HUNG_TASK_PANIC is set.
>
> On the other hand, setups that set much lower timeout than
> DEFAULT_HUNG_TASK_TIMEOUT=120 will have extra CPU activities regardless,
> just because watchdogs will run more often.
>
> Tomasz, any opinions?

First of all, thanks everyone for looking into this.

How about keeping a list of requests in the FIFO order (in other
words: first entry is the first to timeout) and whenever the first
entry is being removed from the list (aka the request actually
completes), re-arming the timer to the timeout of the next request in
the list? This way we don't really have any timer firing unless there
is really a request that timed out.

(In fact, we could optimize it even further by opportunistically
scheduling a timer slightly later and opportunistically handling timed
out requests when other requests are being completed, but this would
be optimizing for the slow path, so probably an overkill.)

As for the length of the request timeout vs the hung task watchdog
timeout, my opinion is that we should make sure that the hung task
watchdog doesn't hit in any case, simply because a misbehaving
userspace process must not be able to panic the kernel. In the
blk-core, the blk_io_schedule() function [1] uses
sysctl_hung_task_timeout_secs to determine the maximum length of a
single uninterruptible sleep. I suppose we could use the same
calculation to obtain our timeout number. What does everyone think?

[1] https://elixir.bootlin.com/linux/v6.12.1/source/block/blk-core.c#L1232

Best regards,
Tomasz