On 12/21/22 at 12:46pm, Guilherme G. Piccoli wrote: > On 20/12/2022 02:51, Baoquan He wrote: > > On 12/20/22 at 01:41pm, Baoquan He wrote: > >> On one intel bare metal system, I can randomly reproduce the kdump hang > >> as below with tick_periodic call trace. Attach the kernel config for > >> reference. > > > > Forgot mentioning this random hang is also caused by adding > > 'nr_cpus=2' into normal kernel's cmdline, then triggering crash will get > > kdump kernel hang as below kdump log shown. > > > > The weird thing is that you seem to be using "nr_cpus=1" instead - this > is the cmdline from the log: > > "nr_cpus=2 irqpoll nr_cpus=1 reset_devices cgroup_disable=memory mce=off > numa=off udev.children-max=2 panic=10 acpi_no_memhotplug > transparent_hugepage=never nokaslr hest_disable novmcoredd cma=0 > hugetlb_cma=0 disable_cpu_apicid=16 [...]" > > You seems to pass twice the "nr_cpus" thing, and I guess kernel pick the > last one? >From the kdump kernel boot log, yes, the nr_cpus=1 is taken. The parse_early_param() will parse the kernel parameters one by one, then the last one will take effect. Here, the problem is not at nr_cpus=2 or 1, the bare metal system has 16 cpus, only 2 cpus is present, it seems to be the halted 14 cpus get wrong message and behave incorrectly to cause the issue. > > Also, what is "disable_cpu_apicid=16"? Could this be related? Not really. Please check disable_cpu_apicid in Documentation/admin-guide/kdump/kdump.rst, it's bsp's apic id. _______________________________________________ kexec mailing list kexec@xxxxxxxxxxxxxxxxxxx http://lists.infradead.org/mailman/listinfo/kexec