On Thu 2023-05-25 13:08:04, Doug Anderson wrote: > Hi, > > On Thu, May 25, 2023 at 9:27 AM Petr Mladek <pmladek@xxxxxxxx> wrote: > > > > On Fri 2023-05-19 10:18:38, Douglas Anderson wrote: > > > Implement a hardlockup detector that doesn't doesn't need any extra > > > arch-specific support code to detect lockups. Instead of using > > > something arch-specific we will use the buddy system, where each CPU > > > watches out for another one. Specifically, each CPU will use its > > > softlockup hrtimer to check that the next CPU is processing hrtimer > > > interrupts by verifying that a counter is increasing. > > > > > > --- a/kernel/watchdog.c > > > +++ b/kernel/watchdog.c > > > @@ -85,7 +85,7 @@ __setup("nmi_watchdog=", hardlockup_panic_setup); > > > > > > #endif /* CONFIG_HARDLOCKUP_DETECTOR */ > > > > > > -#if defined(CONFIG_HARDLOCKUP_DETECTOR_PERF) > > > +#if defined(CONFIG_HARDLOCKUP_DETECTOR_COUNTS_HRTIMER) > > > > > > static DEFINE_PER_CPU(atomic_t, hrtimer_interrupts); > > > static DEFINE_PER_CPU(int, hrtimer_interrupts_saved); > > > @@ -106,6 +106,14 @@ notrace void arch_touch_nmi_watchdog(void) > > > } > > > EXPORT_SYMBOL(arch_touch_nmi_watchdog); > > > > > > +void watchdog_hardlockup_touch_cpu(unsigned int cpu) > > > +{ > > > + per_cpu(watchdog_hardlockup_touched, cpu) = true; > > > + > > > + /* Match with smp_rmb() in watchdog_hardlockup_check() */ > > > + smp_wmb(); > > > > It is great that you described where the related barrier is. > > > > Another important information is what exactly is synchronized. > > And I am actually not sure what we are synchronizing here. > > > > My understanding is that a write barrier should synchronize > > related writes, for example: > > > > X = ...; > > /* Make sure that X is modified before Y */ > > smp_wmb(); > > Y = ...; > > > > And the related read barrier should synchronize the related reads, > > for example: > > > > if (test(Y)) { > > /* > > * Make sure that we use the updated X when > > * we saw the updated Y. > > */ > > smp_rmb(); > > do_something(X); > > } > > > > IMHO, we do not need any barrier here because we have only > > one variable "watchdog_hardlockup_touched" here. > > watchdog_hardlockup_check() will either see the updated value > > or not. But it does not synchronize it against any other > > variables or values. > > Fair. These barriers were present in the original buddy lockup > detector that we've been carrying in ChromeOS but that doesn't > necessarily mean that they were there for a good reason. > > Reasoning about weakly ordered memory always makes my brain hurt and I > never feel confident at the end that I got the right answer and, of > course, this is coupled by the fact that if I have a logic error in my > reasoning that it might cause a rare / subtle bug. :( Sure. Lockless code is complicated. > When possible I > try to write code that uses full blown locks to make it easier to > reason about (even if less efficient), Makes sense. There should be a good reason to use lockless code because it is complicated to do it right and maintain. > but that's not necessarily > possible here. While we obviously don't just want to sprinkle barriers > all over the code, IMO it's not a terrible sin to put a barrier in a > case where it makes it easier to reason about the order of things. I understand this. Well, it is always important to describe the the reason why the barrier was added there. Even when it is wrong, it gives a clue what was the motivation. Otherwise, it is hard to do any changes on the code later. I guess that it might be more problematic for you because you probably are not the original author. > In any case, I guess in this case I would worry about some sort of > ordering race when enabling / disabling the buddy lockup detector. At > the end of the buddy's watchdog_hardlockup_enable() / > watchdog_hardlockup_disable() we adjust the "watchdog_cpus" which > changes buddy assignments. Without a barrier, I _think_ it would be > possible for a new CPU to notice the change in buddies without > noticing the touch. Does that match your understanding? Now when > reasoning about CPUs going online/offline we need to consider this > extra case and we have to decide if there's any chance it could lead > to a false lockup detection. With the memory barriers here, it's a > little easier to think about. This makes sense. I did not think about the hotplug scenario. Well, I suggest to move the barriers into the buddy code and describe it there. It does not make sense to use the barriers for the perf hardlockup. I mean something like: diff --git a/kernel/watchdog_buddy.c b/kernel/watchdog_buddy.c index fee45af2e5bd..ebe71dcb55e6 100644 --- a/kernel/watchdog_buddy.c +++ b/kernel/watchdog_buddy.c @@ -52,6 +52,13 @@ void watchdog_hardlockup_enable(unsigned int cpu) if (next_cpu < nr_cpu_ids) watchdog_hardlockup_touch_cpu(next_cpu); + /* + * Makes sure that watchdog is touched on this CPU before + * other CPUs could see it in watchdog_cpus. The counter + * part is in watchdog_buddy_check_hardlockup(). + */ + smp_wmb(); + cpumask_set_cpu(cpu, &watchdog_cpus); } @@ -69,6 +76,13 @@ void watchdog_hardlockup_disable(unsigned int cpu) if (next_cpu < nr_cpu_ids) watchdog_hardlockup_touch_cpu(next_cpu); + /* + * Makes sure that watchdog is touched on the next CPU before + * this CPU disappear in watchdog_cpus. The counter part is in + * watchdog_buddy_check_hardlockup(). + */ + smp_wmb(); + cpumask_clear_cpu(cpu, &watchdog_cpus); } @@ -89,5 +103,12 @@ void watchdog_buddy_check_hardlockup(unsigned long hrtimer_interrupts) if (next_cpu >= nr_cpu_ids) return; + /* + * Make sure that the watchdog was touched on next CPU when + * watchdog_next_cpu() returned another one because of + * a change in watchdog_hardlockup_enable()/disable(). + */ + smp_rmb(); + watchdog_hardlockup_check(next_cpu, NULL); } > Did the above convince you about keeping the barriers? If so, do you > have any suggested comment that would make it clearer? > > > > > +} > > > + > > > static bool is_hardlockup(unsigned int cpu) > > > { > > > int hrint = atomic_read(&per_cpu(hrtimer_interrupts, cpu)); > > > @@ -443,11 +454,15 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer) > > > struct pt_regs *regs = get_irq_regs(); > > > int duration; > > > int softlockup_all_cpu_backtrace = sysctl_softlockup_all_cpu_backtrace; > > > + unsigned long hrtimer_interrupts; > > > > > > if (!watchdog_enabled) > > > return HRTIMER_NORESTART; > > > > > > - watchdog_hardlockup_kick(); > > > + hrtimer_interrupts = watchdog_hardlockup_kick(); > > > + > > > + /* test for hardlockups */ > > > > I would omit the comment. It is not valid when perf detector is used. > > And checking the buddy is clear from the function name. > > > > > + watchdog_buddy_check_hardlockup(hrtimer_interrupts); > > > > I would personally move this into watchdog_hardlockup_kick(). > > watchdog_timer_fn() is already complex enough. And checking > > the buddy when kicking a CPU makes sense. > > Sure, I'll add that to my list of things to follow-up with. > > > > Also I would not pass "hrtimer_interrupts". I guess that it is > > just an optimization. It is an extra churn in the code. IMHO, > > is is not wort it. This code does not need to be super optimized. > > The main reason I did it is that "hrtimer_interrupts" is static to > watchdog.c now. If I don't pass it in then I have to make it > non-static and add it to the header. That also means anyone looking at > the variable and figuring out how it is read/written needs to go > search for other people that reference it. I feel like it's cleaner to > just pass it in. If you feel strongly that I should change this then > let me know, but otherwise I'll plan to leave this how I have it. I do not have strong opinion. For me, the more important change is to move watchdog_buddy_check_hardlockup() into watchdog_hardlockup_kick(). watchdog_timer_fn() is already too complex. > > > > /* kick the softlockup detector */ > > > if (completion_done(this_cpu_ptr(&softlockup_completion))) { > > > --- a/lib/Kconfig.debug > > > +++ b/lib/Kconfig.debug > > > @@ -1035,10 +1035,55 @@ config BOOTPARAM_SOFTLOCKUP_PANIC > > > > > > Say N if unsure. > > > > > > -config HARDLOCKUP_DETECTOR_PERF > > > +# Both the "perf" and "buddy" hardlockup detectors count hrtimer > > > +# interrupts. This config enables functions managing this common code. > > > +config HARDLOCKUP_DETECTOR_COUNTS_HRTIMER > > > bool > > > select SOFTLOCKUP_DETECTOR > > > > > > +config HARDLOCKUP_DETECTOR_PERF > > > + bool > > > + depends on HAVE_HARDLOCKUP_DETECTOR_PERF > > > + select HARDLOCKUP_DETECTOR_COUNTS_HRTIMER > > > + > > > +config HARDLOCKUP_DETECTOR_BUDDY > > > + bool > > > + depends on SMP > > > + select HARDLOCKUP_DETECTOR_COUNTS_HRTIMER > > > + > > > +# For hardlockup detectors you can have one directly provided by the arch > > > +# or use a "non-arch" one. If you're using a "non-arch" one that is > > > +# further divided the perf hardlockup detector (which, confusingly, needs > > > +# arch-provided perf support) and the buddy hardlockup detector (which just > > > +# needs SMP). In either case, using the "non-arch" code conflicts with > > > +# the NMI watchdog code (which is sometimes used directly and sometimes used > > > +# by the arch-provided hardlockup detector). > > > +config HAVE_HARDLOCKUP_DETECTOR_NON_ARCH > > > + bool > > > + depends on (HAVE_HARDLOCKUP_DETECTOR_PERF || SMP) && !HAVE_NMI_WATCHDOG > > > + default y > > > + > > > +config HARDLOCKUP_DETECTOR_PREFER_BUDDY > > > + bool "Prefer the buddy CPU hardlockup detector" > > > + depends on HAVE_HARDLOCKUP_DETECTOR_NON_ARCH && HAVE_HARDLOCKUP_DETECTOR_PERF && SMP > > > > Huh, I have big troubles to scratch my head around this check: > > > > HAVE_HARDLOCKUP_DETECTOR_NON_ARCH depends on HAVE_HARDLOCKUP_DETECTOR_PERF and SMP > > > > and this depends on HAVE_HARDLOCKUP_DETECTOR_NON_ARCH and again > > on HAVE_HARDLOCKUP_DETECTOR_PERF and SMP. > > The goal is to have "HARDLOCKUP_DETECTOR_PREFER_BUDDY" to show up as > an option if there is an option _other_ than the buddy. If there's not > more than one hardlockup detector to pick from then there's no reason > to ask the person configuring the kernel which one they'd prefer. At > the moment, if you have an "arch" lockup detector then you're stuck > with it, so you only get a choice if a "perf" detector is available > and you've got SMP. > > Ah, so I guess this could be simplified to: > > depends on HAVE_HARDLOCKUP_DETECTOR_PERF && SMP Yes, this is much better. > OK, I'll add that to the list. > > > > > + help > > > + Say Y here to prefer the buddy hardlockup detector over the perf one. > > > + > > > + With the buddy detector, each CPU uses its softlockup hrtimer > > > + to check that the next CPU is processing hrtimer interrupts by > > > + verifying that a counter is increasing. > > > + > > > + This hardlockup detector is useful on systems that don't have > > > + an arch-specific hardlockup detector or if resources needed > > > + for the hardlockup detector are better used for other things. > > > + > > > +# This will select the appropriate non-arch hardlockdup detector > > > +config HARDLOCKUP_DETECTOR_NON_ARCH > > > + bool > > > + depends on HAVE_HARDLOCKUP_DETECTOR_NON_ARCH > > > + select HARDLOCKUP_DETECTOR_BUDDY if !HAVE_HARDLOCKUP_DETECTOR_PERF || HARDLOCKUP_DETECTOR_PREFER_BUDDY > > > + select HARDLOCKUP_DETECTOR_PERF if HAVE_HARDLOCKUP_DETECTOR_PERF && !HARDLOCKUP_DETECTOR_PREFER_BUDDY > > > + > > > # > > > # Enables a timestamp based low pass filter to compensate for perf based > > > # hard lockup detection which runs too fast due to turbo modes. > > > @@ -1053,9 +1098,10 @@ config HARDLOCKUP_CHECK_TIMESTAMP > > > config HARDLOCKUP_DETECTOR > > > bool "Detect Hard Lockups" > > > depends on DEBUG_KERNEL && !S390 > > > > Is there any reason why S390 could not or do not want to use the buddy > > hardlockup detector. > > This isn't a new dependency, but it's a good question. Looking at the > git history, I see commit dea20a3fbdd0 ("[PATCH] Disable > DETECT_SOFTLOCKUP for s390"). ...and it looks like the softlockup > detector still says it's broken on s390. That would mean that the > buddy detector is broken too. It seems that s390 wanted to disable the watchdog completely, see the commit dea20a3fbdd08e5 ("[PATCH] Disable DETECT_SOFTLOCKUP for s390") because they got too many false positives. > > > > - depends on HAVE_HARDLOCKUP_DETECTOR_PERF || HAVE_HARDLOCKUP_DETECTOR_ARCH > > > + depends on HAVE_HARDLOCKUP_DETECTOR_NON_ARCH || HAVE_HARDLOCKUP_DETECTOR_ARCH > > > select LOCKUP_DETECTOR > > > - select HARDLOCKUP_DETECTOR_PERF if HAVE_HARDLOCKUP_DETECTOR_PERF > > > + select HARDLOCKUP_DETECTOR_NON_ARCH if HAVE_HARDLOCKUP_DETECTOR_NON_ARCH > > > > Anyway, the configuration of the hard lockup detectors is insane and > > this patchset makes it even worse, especially the new > > HARDLOCKUP_DETECTOR_NON_ARCH stuff. > > > > It seems that sparc, powerpc and s390 are somehow special. Do you > > still have in mind how they are distinguished using the Kconfig > > variables? > > > > For example, I am pretty confused by the meaning of HAVE_NMI_WATCHDOG. > > > > And sparc has its own variant of > > watchdog_hardlockup_enable()/disable(). It means that it is > > arch-specific. Does it work with the 13th patch which made > > watchdog_hardlockup_enable()/disable() to be watchdog-hardlockup-type > > specific? Is is somehow related to HAVE_NMI_WATCHDOG? > > Does this replace the entire watchdog only only the enable part? > > > > I think that we need to make this more straightforward. But I first > > need to understand the existing maze of config variables. > > I agree that it's confusing. I'm obviously biased, but IMO it's less > confusing after my patchset than before. ;-) The state of the world > before my patchset set a pretty low bar. > > As far as I understand it, at an architecture-level you can choose any > _ONE_ of the following: > > a) Implement bits needed for the the "perf" hardlockup detector. x86 > has done this, some configs of powerpc do this, and arm64 now after my > patch series. This is HAVE_HARDLOCKUP_DETECTOR_PERF. > > b) Implement your own totally separate hardlockup detector that > doesn't use any of the common "perf" code but still looks the same to > userspace (same sysctls, etc). Only powerpc does this (in some > configs). As per conversations in previous versions of my patch > series, apparently powerpc's version is quite fancy and maybe someday > people can move some of these features to the common code. This is > HAVE_HARDLOCKUP_DETECTOR_ARCH. > > c) Don't implement the full features of a hardlockup detector but > still have the basics. In the very least, I think it doesn't support > the sysctls "hardlockup_panic" and "hardlockup_all_cpu_backtrace". It > doesn't support the kernel command line parameter "nmi_watchdog=". I > don't know for sure if there are any other differences. Only sparc64 > does this. This is HAVE_NMI_WATCHDOG. Confusingly, > HAVE_HARDLOCKUP_DETECTOR_ARCH selects HAVE_NMI_WATCHDOG. > > d) Don't implement _any_ hardlockup detector of any sort. After my > patchset you can still end up with "buddy" if you have SMP. > > One thing that would probably help would be to bring sparc64 to a full > "arch" hardlockup implementation and then get rid of the special case. > That seems a bit outside my scope, though if someone wanted to post > patches for that I'd be willing to give them a review. It would be nice but it might be problematic if we do not have access to the hardware. > I guess other than that, the best we could try to do is to rename some > configs and/or add some subconfigs to describe certain features? Maybe > HAVE_NMI_WATCHDOG => HAVE_HARDLOCKUP_DETECTOR_ARCH_BASIC_FEATURES > would help? I'd love to come up with a better name for > HAVE_HARDLOCKUP_DETECTOR_NON_ARCH but I couldn't come up with one. > Maybe the unwieldy "HAVE_HARDLOCKUP_DETECTOR_THAT_COUNTS_HRTIMER"? Renaming the config variables seems to be the best solution at the moment. IMHO, it would be nice to have something like: + CONFIG_HARDLOCKUP_DETECTOR for code shared by all hardlockup detectors + CONFIG_HARDLOCKUP_DETECTOR_PERF for code using kernel/watchdog_perf.c + CONFIG_HARDLOCKUP_DETECTOR_BUDDY for code using kernel/watchdog_buddy.c + HAVE_HARDLOCKUP_DETECTOR_PERF set by architectures that support using kernel/watchdog_perf.c + HAVE_HARDLOCKUP_DETECTOR_ARCH set by architectures that have alternative implementation of the hardlockup detector + CONFIG_HARDLOCKUP_DETECTOR_PREFER_BUDDY Allow to prefer the buddy detector when _PERF or _ARCH is available as well. + HAVE_HARDLOCKUP_PANIC + HAVE_HARDLOCKUP_ALL_CPU_BACKTRACE set when the earchitecture support these features and used for the sysfs interface > If you have concrete suggestions for what would be cleaner, let me > know and I can queue up a patch. ...or I'm happy to review a patch. I am not sure how complicated it would be to rename the config variables to somehing sane. I am sorry I do not have time to prepare the patches at the moment. I'll let Andrew to decide if he would require this cleanup to accept the patchset. Best Regards, Petr