On Wed, Oct 18, 2017 at 05:01:57PM +0200, Bernhard Kaindl wrote: > Hi Greg! > > On 17.10.2017 17:57, Sebastian Andrzej Siewior wrote: > > Upstream commit 5cf0791da5c162ebc14b01eb01631cfa7ed4fa6e > > This race happens when a process exit is preempted at the wrong time and > we confirm the bug fixed by this commit to happen on Linux-3.18 systems: > > This is how we are affected without this fix which is missing in 3.18.x: > RT Process fails to make progress -> HW watchdog fires -> System resets. > > We were able to see the commit's sequence of events and the CR3 having the > PGD of a dead process using ftrace + SW watchdog. > > Systems with SCHED_FIFO tasks are especially vulnerable, because when > the SCHED_FIFO task with the highest priority gets the deceased PGD@CR3, the > task will page fault forever without making progress and no other process > can be scheduled anymore. > > If this SCHED_FIFO task is triggering an HW watchdog, the HW watchdog > will fire, but if not, the system will ping, but not do anything else. > > With kernel.sched_rt_runtime_us > 0, SCHED_OTHER processes could cause > a context switch after kernel.sched_rt_period_us expires, so usually > this would allow the system to recover, because then CR3 would be swichted, > but this is too late, and a real-time system would have failed > at this point already. > > With kernel.sched_rt_runtime_us < 0, the only recovery in this case is a HW > watchdog resetting the machine, but with devastating loss of function until > the system is up again. > > All UP preemptible-kernel x86 real-time systems, including industrial > control/automation, SCADA, Linux-based PLCs (e.g. using Intel Quark), > are definitely affected when process termination collide with HW/SW > interrupts. > > Non-real time systems: Except for some threads occasionally failing to make > progress, the system will recover: > Other processes will eventually be scheduled, causing CR3 to be loaded again > correctly from task->mm->pgd, resolving the problem. > > > This patch is already part of various stable tree but is missing in the > v3.18> v4.1 > Yes, the long-term branches of 3.2, 3.10, 3.16 and 4.4 have got the fix > (long time ago!), 4.9 already has it merged mainline. > > > tree and applies cleanly on top of > > v3.18.69 > > v4.1.43 > > > > I've been contacted by Bernhard Kaindl (Cc:) and he asked about the > > whereabouts of the patch in the two stable trees. He can confirm that > > this patch cures his problem on the v3.18 stable tree he is using. > > He assumes that the same problem might occur on the v4.1 tree and should > > be fixed by the patch but he has no working setup with v4.1 kernel to > > confirm this. > > I comfirm - here is a quick summary of what we found: > > We saw that our watchdog process got the PGD of a dead process in CR3, > causing failure to pulse the watchdog because of the page fault loop > described in the commit log. > > We had a lab of 16 machines available for testing the crash fixed by this > commit. > > We found this fix by pure luck thanks to Google after a lot of searches by > several people. With the fix, over this weekend, in the lab, we didn't > trigger this issue anymore. > > (actually, we found another issue in our own code and had an unknown machine > hang for which to debug, we need more specific HW which we don't have ATM, > but it is likely that this is also the same issue caused by our own bug) > > Before having the fix, we demonstrated the sequence of events which the > commit log describes within one hour on a single machine exactly. > > With Linux-4.4.64 (which does have this fix), we didn't see this bug. > > Because it appears to fix both 3.18 and 4.4, it makes sense to apply it to > the v4.1.x longterm branch too. Thanks for the detailed description, much appreciated. I've queued it up for 3.18, it's up to Sasha to do it for 4.1. thanks again, greg k-h