On 12/4/24 15:51, Brian Geffon wrote: > On Wed, Dec 4, 2024 at 9:40 AM Tomasz Figa <tfiga@xxxxxxxxxxxx> wrote: >> >> On Tue, Dec 3, 2024 at 4:29 AM Joanne Koong <joannelkoong@xxxxxxxxx> wrote: >>> >>> On Mon, Dec 2, 2024 at 6:43 AM Bernd Schubert >>> <bernd.schubert@xxxxxxxxxxx> wrote: >>>> >>>> On 12/2/24 10:45, Tomasz Figa wrote: >>>>> Hi everyone, >>>>> >>>>> On Thu, Nov 28, 2024 at 8:55 PM Sergey Senozhatsky >>>>> <senozhatsky@xxxxxxxxxxxx> wrote: >>>>>> >>>>>> Cc-ing Tomasz >>>>>> >>>>>> On (24/11/28 11:23), Bernd Schubert wrote: >>>>>>>> Thanks for the pointers again, Bernd. >>>>>>>> >>>>>>>>> Miklos had asked for to abort the connection in v4 >>>>>>>>> https://lore.kernel.org/all/CAJfpegsiRNnJx7OAoH58XRq3zujrcXx94S2JACFdgJJ_b8FdHw@xxxxxxxxxxxxxx/raw >>>>>>>> >>>>>>>> OK, sounds reasonable. I'll try to give the series some testing in the >>>>>>>> coming days. >>>>>>>> >>>>>>>> // I still would probably prefer "seconds" timeout granularity. >>>>>>>> // Unless this also has been discussed already and Bernd has a link ;) >>>>>>> >>>>>>> >>>>>>> The issue is that is currently iterating through 256 hash lists + >>>>>>> pending + bg. >>>>>>> >>>>>>> https://lore.kernel.org/all/CAJnrk1b7bfAWWq_pFP=4XH3ddc_9GtAM2mE7EgWnx2Od+UUUjQ@xxxxxxxxxxxxxx/raw >>>>>> >>>>>> Oh, I see. >>>>>> >>>>>>> Personally I would prefer a second list to avoid the check spike and latency >>>>>>> https://lore.kernel.org/linux-fsdevel/9ba4eaf4-b9f0-483f-90e5-9512aded419e@xxxxxxxxxxx/raw >>>>>> >>>>>> That's good to know. I like the idea of less CPU usage in general, >>>>>> our devices a battery powered so everything counts, to some extent. >>>>>> >>>>>>> What is your opinion about that? I guess android and chromium have an >>>>>>> interest low latencies and avoiding cpu spikes? >>>>>> >>>>>> Good question. >>>>>> >>>>>> Can't speak for android, in chromeos we probably will keep it at 1 minute, >>>>>> but this is because our DEFAULT_HUNG_TASK_TIMEOUT is larger than that (we >>>>>> use default value of 120 sec). There are setups that might use lower >>>>>> values, or even re-define default value, e.g.: >>>>>> >>>>>> arch/arc/configs/axs101_defconfig:CONFIG_DEFAULT_HUNG_TASK_TIMEOUT=10 >>>>>> arch/arc/configs/axs103_defconfig:CONFIG_DEFAULT_HUNG_TASK_TIMEOUT=10 >>>>>> arch/arc/configs/axs103_smp_defconfig:CONFIG_DEFAULT_HUNG_TASK_TIMEOUT=10 >>>>>> arch/arc/configs/hsdk_defconfig:CONFIG_DEFAULT_HUNG_TASK_TIMEOUT=10 >>>>>> arch/arc/configs/vdk_hs38_defconfig:CONFIG_DEFAULT_HUNG_TASK_TIMEOUT=10 >>>>>> arch/arc/configs/vdk_hs38_smp_defconfig:CONFIG_DEFAULT_HUNG_TASK_TIMEOUT=10 >>>>>> arch/powerpc/configs/mvme5100_defconfig:CONFIG_DEFAULT_HUNG_TASK_TIMEOUT=20 >>>>>> >>>>>> In those cases 1 minute fuse timeout will overshot HUNG_TASK_TIMEOUT >>>>>> and then the question is whether HUNG_TASK_PANIC is set. > > In my opinion this is a good argument for having the hung task timeout > and a fuse timeout independent. The hung task timeout is for hung > kernel threads, in this situation we're potentially taking too long in > userspace but that doesn't necessarily mean the system is hung. I > think a loop which does an interruptible wait with a timeout of 1/2 > the hung task timeout would make sense to ensure the hung task timeout > doesn't hit. There might be situations where we want a fuse timeout > which is larger than the hung task timeout, perhaps a file system > being read over a satellite internet connection? For a network file system the remote server also might just hang and one might want to wait much longer than 1/2 hung task timeout for recovery. Thanks, Bernd