On Tue, Dec 3, 2024 at 4:29 AM Joanne Koong <joannelkoong@xxxxxxxxx> wrote: > > On Mon, Dec 2, 2024 at 6:43 AM Bernd Schubert > <bernd.schubert@xxxxxxxxxxx> wrote: > > > > On 12/2/24 10:45, Tomasz Figa wrote: > > > Hi everyone, > > > > > > On Thu, Nov 28, 2024 at 8:55 PM Sergey Senozhatsky > > > <senozhatsky@xxxxxxxxxxxx> wrote: > > >> > > >> Cc-ing Tomasz > > >> > > >> On (24/11/28 11:23), Bernd Schubert wrote: > > >>>> Thanks for the pointers again, Bernd. > > >>>> > > >>>>> Miklos had asked for to abort the connection in v4 > > >>>>> https://lore.kernel.org/all/CAJfpegsiRNnJx7OAoH58XRq3zujrcXx94S2JACFdgJJ_b8FdHw@xxxxxxxxxxxxxx/raw > > >>>> > > >>>> OK, sounds reasonable. I'll try to give the series some testing in the > > >>>> coming days. > > >>>> > > >>>> // I still would probably prefer "seconds" timeout granularity. > > >>>> // Unless this also has been discussed already and Bernd has a link ;) > > >>> > > >>> > > >>> The issue is that is currently iterating through 256 hash lists + > > >>> pending + bg. > > >>> > > >>> https://lore.kernel.org/all/CAJnrk1b7bfAWWq_pFP=4XH3ddc_9GtAM2mE7EgWnx2Od+UUUjQ@xxxxxxxxxxxxxx/raw > > >> > > >> Oh, I see. > > >> > > >>> Personally I would prefer a second list to avoid the check spike and latency > > >>> https://lore.kernel.org/linux-fsdevel/9ba4eaf4-b9f0-483f-90e5-9512aded419e@xxxxxxxxxxx/raw > > >> > > >> That's good to know. I like the idea of less CPU usage in general, > > >> our devices a battery powered so everything counts, to some extent. > > >> > > >>> What is your opinion about that? I guess android and chromium have an > > >>> interest low latencies and avoiding cpu spikes? > > >> > > >> Good question. > > >> > > >> Can't speak for android, in chromeos we probably will keep it at 1 minute, > > >> but this is because our DEFAULT_HUNG_TASK_TIMEOUT is larger than that (we > > >> use default value of 120 sec). There are setups that might use lower > > >> values, or even re-define default value, e.g.: > > >> > > >> arch/arc/configs/axs101_defconfig:CONFIG_DEFAULT_HUNG_TASK_TIMEOUT=10 > > >> arch/arc/configs/axs103_defconfig:CONFIG_DEFAULT_HUNG_TASK_TIMEOUT=10 > > >> arch/arc/configs/axs103_smp_defconfig:CONFIG_DEFAULT_HUNG_TASK_TIMEOUT=10 > > >> arch/arc/configs/hsdk_defconfig:CONFIG_DEFAULT_HUNG_TASK_TIMEOUT=10 > > >> arch/arc/configs/vdk_hs38_defconfig:CONFIG_DEFAULT_HUNG_TASK_TIMEOUT=10 > > >> arch/arc/configs/vdk_hs38_smp_defconfig:CONFIG_DEFAULT_HUNG_TASK_TIMEOUT=10 > > >> arch/powerpc/configs/mvme5100_defconfig:CONFIG_DEFAULT_HUNG_TASK_TIMEOUT=20 > > >> > > >> In those cases 1 minute fuse timeout will overshot HUNG_TASK_TIMEOUT > > >> and then the question is whether HUNG_TASK_PANIC is set. > > >> > > >> On the other hand, setups that set much lower timeout than > > >> DEFAULT_HUNG_TASK_TIMEOUT=120 will have extra CPU activities regardless, > > >> just because watchdogs will run more often. > > >> > > >> Tomasz, any opinions? > > > > > > First of all, thanks everyone for looking into this. > > Hi Sergey and Tomasz, > > Sorry for the late reply - I was out the last couple of days. Thanks > Bernd for weighing in and answering the questions! > > > > > > > How about keeping a list of requests in the FIFO order (in other > > > words: first entry is the first to timeout) and whenever the first > > > entry is being removed from the list (aka the request actually > > > completes), re-arming the timer to the timeout of the next request in > > > the list? This way we don't really have any timer firing unless there > > > is really a request that timed out. > > I think the issue with this is that we likely would end up wasting > more cpu cycles. For a busy FUSE server, there could be hundreds > (thousands?) of requests that happen within the span of > FUSE_TIMEOUT_TIMER_FREQ seconds. > > While working on the patch, one thing I considered was disarming the > timer in the timeout handler fuse_check_timeout() if no requests are > on the list, in order to accomodate for "quiet periods" (eg if the > FUSE server is inactive for a few minutes or hours) but ultimately > decided against it because of the overhead it'd incur per request (eg > check if the timer is disarmed, would most likely need to grab the > fc->lock as well since timer rearming would need to be synchronized > between background and non-background requests, etc.). > > All in all, imo I don't think having the timer trigger every 60 > seconds (what FUSE_TIMEOUT_TIMER_FREQ is set to) is too costly. > > > > > Requests are in FIFO order on the list and only head is checked. > > There are 256 hash lists per fuse device for requests currently > > in user space, though. > > > > > > > > (In fact, we could optimize it even further by opportunistically > > > scheduling a timer slightly later and opportunistically handling timed > > > out requests when other requests are being completed, but this would > > > be optimizing for the slow path, so probably an overkill.) > > > > > > As for the length of the request timeout vs the hung task watchdog > > > timeout, my opinion is that we should make sure that the hung task > > > watchdog doesn't hit in any case, simply because a misbehaving > > > userspace process must not be able to panic the kernel. In the > > > blk-core, the blk_io_schedule() function [1] uses > > > sysctl_hung_task_timeout_secs to determine the maximum length of a > > > single uninterruptible sleep. I suppose we could use the same > > > calculation to obtain our timeout number. What does everyone think? > > > > > > [1] https://elixir.bootlin.com/linux/v6.12.1/source/block/blk-core.c#L1232 > > > > I think that is a good idea. > > Btw, just something to note, the fuse request timeout has an upper > margin of error associated with it. > > Copying over from the commit message - > > "Please note that these timeouts are not 100% precise. The request may > take an extra FUSE_TIMEOUT_TIMER_FREQ seconds beyond the requested max > timeout due to how it's internally implemented." > > For example, if a server sets the request timeout config to 10 > minutes, the server could be aborted after 11 minutes > (FUSE_TIMEOUT_TIMER_FREQ is set to 60 seconds internally) instead of > 10 minutes. > Let me add +Brian Geffon who also was thinking about the right timeout value. > > Thanks, > Joanne > > > > > > Thanks, > > Bernd