Hi, On Thu, May 4, 2023 at 7:36 PM Nicholas Piggin <npiggin@xxxxxxxxx> wrote: > > On Fri May 5, 2023 at 8:13 AM AEST, Douglas Anderson wrote: > > From: Colin Cross <ccross@xxxxxxxxxxx> > > > > Implement a hardlockup detector that doesn't doesn't need any extra > > arch-specific support code to detect lockups. Instead of using > > something arch-specific we will use the buddy system, where each CPU > > watches out for another one. Specifically, each CPU will use its > > softlockup hrtimer to check that the next CPU is processing hrtimer > > interrupts by verifying that a counter is increasing. > > Powerpc's watchdog has an SMP checker, did you see it? No, I wasn't aware of it. Interesting, it seems to basically enable both types of hardlockup detectors together. If that really catches more lockups, it seems like we could do the same thing for the buddy system. If people want, I don't think it would be very hard to make the buddy system _not_ exclusive of the perf system. Instead of having the buddy system implement the "weak" functions I could just call the buddy functions in the right places directly and leave the "weak" functions for a more traditional hardlockup detector to implement. Opinions? Maybe after all this lands, the powerpc watchdog could move to use the common code? As evidenced by this patch series, there's not really a reason for the SMP detection to be platform specific. > It's all to > all rather than buddy which makes it more complicated but arguably > bit better functionality. Can you come up with an example crash where the "all to all" would work better than the simple buddy system provided by this patch? It seems like they would be equivalent, but I could be missing something. Specifically they both need at least one non-locked-up CPU to detect a problem. If one or more CPUs is locked up then we'll always detect it. I suppose maybe you could provide a better error message at lockup time saying that several CPUs were locked up and that could be helpful. For now, I'd keep the current buddy system the way it is and if you want to provide a patch improving things to be "all-to-all" in the future that would be interesting to review.