On Tue, Sep 18, 2018 at 03:21:08PM +0200, Steffen Trumtrar wrote: > > Hi all! > > I'm seeing an issue with the dw_wdt watchdog on the SoCFPGA ARM platform > with the latest linux-rt v4.18.5-rt4. Actually I seem to have the same > problem, that these patches try to fix: > > 38a1222ae4f364d5bd5221fe305dbb0889f45d15 > Author: Christophe Leroy <christophe.leroy@xxxxxx> > AuthorDate: Fri Dec 8 11:18:35 2017 +0100 > Commit: Wim Van Sebroeck <wim@xxxxxxxxx> > CommitDate: Thu Dec 28 20:45:57 2017 +0100 > > Follows: v4.15-rc3 (345) > Precedes: v4.16-rc1 (13997) > > watchdog: core: make sure the watchdog worker always works > > When running a command like 'chrt -f 50 dd if=/dev/zero of=/dev/null', > the watchdog_worker fails to service the HW watchdog and the > HW watchdog fires long before the watchdog soft timeout. > > At the moment, the watchdog_worker is invoked as a delayed work. > Delayed works are handled by non realtime kernel threads. The > WQ_HIGHPRI flag only increases the niceness of that threads. > > This patch replaces the delayed work logic by kthread delayed work, > and sets the associated kernel task to SCHED_FIFO with the highest > priority, in order to ensure that the watchdog worker will run as > soon as possible. > > > 1ff688209e2ed23f699269b9733993e2ce123fd2 > Author: Christophe Leroy <christophe.leroy@xxxxxx> > AuthorDate: Thu Jan 18 12:11:21 2018 +0100 > Commit: Wim Van Sebroeck <wim@xxxxxxxxx> > CommitDate: Sun Jan 21 12:44:59 2018 +0100 > > Follows: v4.15-rc3 (349) > Precedes: v4.16-rc1 (13993) > > watchdog: core: make sure the watchdog_worker is not deferred > > commit 4cd13c21b207e ("softirq: Let ksoftirqd do its job") has the > effect of deferring timer handling in case of high CPU load, hence > delaying the delayed work allthought the worker is running which > high realtime priority. > > As hrtimers are not managed by softirqs, this patch replaces the > delayed work by a plain work and uses an hrtimer to schedule that work. These above two commits are trying very hard to ensure timely wakeup and execution of the watchdogd thread. First by moving moving to kthread delayed work, and secondly to vanilla kthread work + hardirq. This is sufficient on mainline, because hardirq expiry fns are unconditionally executed in hardirq context. With PREEMPT_RT_FULL, however, the hrtimer expiry functions are executed in softirq context unless explicitly opted out. ...meaning that w/ PREEMPT_RT_FULL the expiry (and therefore the watchdogd wakeup) may be indefinitely starved if there are runnable RT tasks of higher priority than the softirq callback thread (ktimersoftd @ SCHED_FIFO 1 by default). This is an inversion. One possible solution is to opt-out of the hrtimer softirq deferral by making use of the HRTIMER_MODE_HARD, however, the expiry function will need to be vetted for use in hardirq context w/ PREEMPT_RT_FULL. From a cursory glance at the kthread_worker locking, it is not hardirq safe. :-\ Julia