Re: [PATCH RESEND v9 2/3] fuse: add optional kernel-enforced timeout for requests

Bernd Schubert <bernd.schubert@xxxxxxxxxxx> · Mon, 2 Dec 2024 15:43:39 +0100

On 12/2/24 10:45, Tomasz Figa wrote:
> Hi everyone,
> 
> On Thu, Nov 28, 2024 at 8:55 PM Sergey Senozhatsky
> <senozhatsky@xxxxxxxxxxxx> wrote:
>>
>> Cc-ing Tomasz
>>
>> On (24/11/28 11:23), Bernd Schubert wrote:
>>>> Thanks for the pointers again, Bernd.
>>>>
>>>>> Miklos had asked for to abort the connection in v4
>>>>> https://lore.kernel.org/all/CAJfpegsiRNnJx7OAoH58XRq3zujrcXx94S2JACFdgJJ_b8FdHw@xxxxxxxxxxxxxx/raw
>>>>
>>>> OK, sounds reasonable. I'll try to give the series some testing in the
>>>> coming days.
>>>>
>>>> // I still would probably prefer "seconds" timeout granularity.
>>>> // Unless this also has been discussed already and Bernd has a link ;)
>>>
>>>
>>> The issue is that is currently iterating through 256 hash lists +
>>> pending + bg.
>>>
>>> https://lore.kernel.org/all/CAJnrk1b7bfAWWq_pFP=4XH3ddc_9GtAM2mE7EgWnx2Od+UUUjQ@xxxxxxxxxxxxxx/raw
>>
>> Oh, I see.
>>
>>> Personally I would prefer a second list to avoid the check spike and latency
>>> https://lore.kernel.org/linux-fsdevel/9ba4eaf4-b9f0-483f-90e5-9512aded419e@xxxxxxxxxxx/raw
>>
>> That's good to know.  I like the idea of less CPU usage in general,
>> our devices a battery powered so everything counts, to some extent.
>>
>>> What is your opinion about that? I guess android and chromium have an
>>> interest low latencies and avoiding cpu spikes?
>>
>> Good question.
>>
>> Can't speak for android, in chromeos we probably will keep it at 1 minute,
>> but this is because our DEFAULT_HUNG_TASK_TIMEOUT is larger than that (we
>> use default value of 120 sec). There are setups that might use lower
>> values, or even re-define default value, e.g.:
>>
>> arch/arc/configs/axs101_defconfig:CONFIG_DEFAULT_HUNG_TASK_TIMEOUT=10
>> arch/arc/configs/axs103_defconfig:CONFIG_DEFAULT_HUNG_TASK_TIMEOUT=10
>> arch/arc/configs/axs103_smp_defconfig:CONFIG_DEFAULT_HUNG_TASK_TIMEOUT=10
>> arch/arc/configs/hsdk_defconfig:CONFIG_DEFAULT_HUNG_TASK_TIMEOUT=10
>> arch/arc/configs/vdk_hs38_defconfig:CONFIG_DEFAULT_HUNG_TASK_TIMEOUT=10
>> arch/arc/configs/vdk_hs38_smp_defconfig:CONFIG_DEFAULT_HUNG_TASK_TIMEOUT=10
>> arch/powerpc/configs/mvme5100_defconfig:CONFIG_DEFAULT_HUNG_TASK_TIMEOUT=20
>>
>> In those cases 1 minute fuse timeout will overshot HUNG_TASK_TIMEOUT
>> and then the question is whether HUNG_TASK_PANIC is set.
>>
>> On the other hand, setups that set much lower timeout than
>> DEFAULT_HUNG_TASK_TIMEOUT=120 will have extra CPU activities regardless,
>> just because watchdogs will run more often.
>>
>> Tomasz, any opinions?
> 
> First of all, thanks everyone for looking into this.
> 
> How about keeping a list of requests in the FIFO order (in other
> words: first entry is the first to timeout) and whenever the first
> entry is being removed from the list (aka the request actually
> completes), re-arming the timer to the timeout of the next request in
> the list? This way we don't really have any timer firing unless there
> is really a request that timed out.

Requests are in FIFO order on the list and only head is checked. 
There are 256 hash lists per fuse device for requests currently
in user space, though.

> 
> (In fact, we could optimize it even further by opportunistically
> scheduling a timer slightly later and opportunistically handling timed
> out requests when other requests are being completed, but this would
> be optimizing for the slow path, so probably an overkill.)
> 
> As for the length of the request timeout vs the hung task watchdog
> timeout, my opinion is that we should make sure that the hung task
> watchdog doesn't hit in any case, simply because a misbehaving
> userspace process must not be able to panic the kernel. In the
> blk-core, the blk_io_schedule() function [1] uses
> sysctl_hung_task_timeout_secs to determine the maximum length of a
> single uninterruptible sleep. I suppose we could use the same
> calculation to obtain our timeout number. What does everyone think?
> 
> [1] https://elixir.bootlin.com/linux/v6.12.1/source/block/blk-core.c#L1232

I think that is a good idea.

Thanks,
Bernd