On 09/18/2018 06:21 AM, Steffen Trumtrar wrote:
Hi all! I'm seeing an issue with the dw_wdt watchdog on the SoCFPGA ARM platform with the latest linux-rt v4.18.5-rt4. Actually I seem to have the same problem, that these patches try to fix: 38a1222ae4f364d5bd5221fe305dbb0889f45d15 Author: Christophe Leroy <christophe.leroy@xxxxxx> AuthorDate: Fri Dec 8 11:18:35 2017 +0100 Commit: Wim Van Sebroeck <wim@xxxxxxxxx> CommitDate: Thu Dec 28 20:45:57 2017 +0100 Follows: v4.15-rc3 (345) Precedes: v4.16-rc1 (13997) watchdog: core: make sure the watchdog worker always works When running a command like 'chrt -f 50 dd if=/dev/zero of=/dev/null', the watchdog_worker fails to service the HW watchdog and the HW watchdog fires long before the watchdog soft timeout. At the moment, the watchdog_worker is invoked as a delayed work. Delayed works are handled by non realtime kernel threads. The WQ_HIGHPRI flag only increases the niceness of that threads. This patch replaces the delayed work logic by kthread delayed work, and sets the associated kernel task to SCHED_FIFO with the highest priority, in order to ensure that the watchdog worker will run as soon as possible. 1ff688209e2ed23f699269b9733993e2ce123fd2 Author: Christophe Leroy <christophe.leroy@xxxxxx> AuthorDate: Thu Jan 18 12:11:21 2018 +0100 Commit: Wim Van Sebroeck <wim@xxxxxxxxx> CommitDate: Sun Jan 21 12:44:59 2018 +0100 Follows: v4.15-rc3 (349) Precedes: v4.16-rc1 (13993) watchdog: core: make sure the watchdog_worker is not deferred commit 4cd13c21b207e ("softirq: Let ksoftirqd do its job") has the effect of deferring timer handling in case of high CPU load, hence delaying the delayed work allthought the worker is running which high realtime priority. As hrtimers are not managed by softirqs, this patch replaces the delayed work by a plain work and uses an hrtimer to schedule that work. If I run the same test or 'chrt 50 hackbench 20 -l 150' or any task where I change the prio with chrt and that runs long enough, I get a system reset from the watchdog because it times out. This only happens if the watchdog is already enabled on boot and CONFIG_PREEMPT_RT_FULL is set. Any idea if I'm missing something essential? If I understand it correctly, the two commits fix the framework and therefore the dw_wdt driver doesn't need any updates.
I find your e-mail confusing, sorry. The subject says that the watchdog is not triggering, the description says that it is triggering when it should not. You also provide no information if the watchdog is active (open from user space) or not. There is some indication in "This only happens if the watchdog is already enabled on boot" but that isn't really precise - it may be enabled on boot but still open. On top of that, your e-mail suggests that the problem may be a regression, since you refer to a specific kernel release, yet you provide no information if the very same test worked with a different kernel version, or what that kernel version would be. Please not only describe what you are doing, but also provide the complete context. Specifically, - Did this ever work ? If yes, what are working kernel versions ? - Is the watchdog device open ? - Does it make a difference if it is ? - What is the configured watchdog timeout (both from BIOS/ROMMON and in Linux) ? Thanks, Guenter