Hi Andrew, Thanks a lot for paying attention! On Thu, Oct 24, 2024 at 10:05 AM Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> wrote: > > On Tue, 22 Oct 2024 19:47:34 +0800 Lance Yang <ioworker0@xxxxxxxxx> wrote: > > > Hi all, > > > > This patchset adds a counter, hung_task_detect_count, to track the number of > > times hung tasks are detected. This counter provides a straightforward way > > to monitor hung task events without manually checking dmesg logs. > > > > With this counter in place, system issues can be spotted quickly, allowing > > admins to step in promptly before system load spikes occur, even if the > > hung_task_warnings value has been decreased to 0 well before. > > > > Recently, we encountered a situation where warnings about hung tasks were > > buried in dmesg logs during load spikes. Introducing this counter could > > have helped us detect such issues earlier and improve our analysis efficiency. > > > > Isn't the answer to this problem "write a better parser"? I mean, Yeah, I certainly agree that having a good parser is important, and I'm working on that as well ;) > we're providing userspace with information which is already available. IHMO, there are two reasons why this counter remains valuable: 1) It allows us to easily detect hung tasks in time before load spikes occur, using simple and common monitoring tools like Prometheus. 2) It ensures that we remain aware of hung tasks even when the hung_task_warnings value has already been decreased to 0 well before. Thanks again for your time! Lance >