On Tue, 2018-12-11 at 23:39 -0500, Qian Cai wrote: > [+ kexec@xxxxxxxxxxxxxxxxxxx] > > The debugging progress so far... > > Wait up to 5 minutes for other CPUs to stop in crash_smp_send_stop() made no > difference. > > With "dev" branch of this tree [1], it is possible to print out messages from > purgatory when passing something like "--port=0x602B0000 > --port-lsr=0x602B0000,0x80" to kexec. However, even enable_dcache() in > setup_arch() will hung like forever on this machine (working fine on another > arm64 server - Cortex-A72). After removed only enable_dcache() / > disable_dcache() from setup_arch() etc without removing printf() lines, it did > print out, > > I'm in purgatory > purgatory: entry=0000000090080000 > purgatory: dtb=0000000092d50000 > purgatory: D-cache Enabled before SHA verification > purgatory: D-cache Disabled after SHA verification > > So, it confirmed that it must hung somewhere in arm64/kernel/head.S (.stext) > or > the early part of start_kernel() before earlycon was initialized. > > Also confirmed that passing nr_cpus=64 in the first kernel would again make > everything work fine with this new kexec. > > Since enable_dcache() would hung as well, I suspect this has something to do > with enabling MMU (i.e, .stext -> __primary_switch -> __enable_mmu) coupling > with some sort of per-CPU data where the number of CPUs matters. Still debugging a hung to enable MMU (enable_dcache) in purgatory [1] which may provide some clues for the hung later in the 2nd kernel. dsb nshst tlbi alle2 dsb nsh isb bl get_ips_bits lsl x1, x0, #TCR_IPS_EL2_SHIFT orr x1, x1, x7 mov x0, x6 ldr x2, =MEMORY_ATTRIBUTES msr mair_el2, x2 msr tcr_el2, x1 msr ttbr0_el2, x0 isb mrs x0, sctlr_el2 ldr x3, =SCTLR_ELx_FLAGS orr x0, x0, x3 msr sctlr_el2, x0 <--- hung right on this instruction. Without CONFIG_ARM64_VHE (i.e., running in EL1), it is able to run enable_dcache() but it still hung later in the 2nd kernel somewhere. dsb nshst tlbi vmalle1 dsb nsh isb bl get_ips_bits lsl x1, x0, #TCR_IPS_EL1_SHIFT orr x1, x1, x7 mov x0, x6 ldr x2, =MEMORY_ATTRIBUTES msr mair_el1, x2 msr tcr_el1, x1 msr ttbr0_el1, x0 isb mrs x0, sctlr_el1 ldr x3, =SCTLR_ELx_FLAGS orr x0, x0, x3 msr sctlr_el1, x0 isb One data point of this system is that it has 4 threads on each core. Each 2-core share a same L1 and L2 caches, so that is 8 CPUs shares them each. All CPUs share a same L3 cache. Hence, I wonder if this is because of incomplete cache/TLB invalidation that had stale entries (or uninitialised junk which just happens to look valid) present before turning the MMU on. [1] https://github.com/pratyushanand/kexec-tools/blob/devel/purgatory/arch/\ arm64/cache.S > > Right now, I think I need to find a way to print directly to pl011 serial > console while debugging those assembly code like CONFIG_DEBUG_LL for arm64, so > it can be used to locate where exactly it hung. Otherwise, I am shooting in > the > dark. > > [1] https://github.com/pratyushanand/kexec-tools > > === original email === > > On this HPE Apollo 70 arm64 server with 256 CPUs, triggering a crash dump just > hung (4.20-rc6 as well as 4.18). It was confirmed that the executing went as > far > as entering __cpu_soft_restart(), > > __crash_kexec > machine_kexec > cpu_soft_restart > restart > __cpu_soft_restart > > The earlycon was enabled but had no output from the 2nd kernel, so it was > pretty > much stuck in all those assembly code in arm64/kernel/head.S or the early part > of start_kernel() before earlycon was initialized. > > It turned out this has something to do with nr_cpus in the 1st kernel, > although > the 2nd kernel always has nr_cpus=1 [1]. It was tested with both > crashkernel=512M or 768M. > > nr_cpus <= 96 GOOD (2nd kernel was up in 2-3 mins.) > nr_cpus=256 BAD (2nd kernel was NOT up after 1 hour.) > nr_cpus=127 BAD (2nd kernel was NOT up after 10 mins.) > > I did also test with and without CONFIG_ARM64_VHE (i.e., el2_switch) made no > difference. > > [1] KDUMP_COMMANDLINE_APPEND="irqpoll nr_cpus=1 swiotlb=noforce reset_devices" > > I am still figuring out a way to debug those assembly code to where it > actually > hung, but the server was hooked up with a conserver that was not able to > generate any sysrq and I have no shell access to the conserver, so seems a bit > difficult to use kgdb or kdb in this case. > > CPU information, > > # lscpu > Architecture: aarch64 > Byte Order: Little Endian > CPU(s): 256 > On-line CPU(s) list: 0-255 > Thread(s) per core: 4 > Core(s) per socket: 32 > Socket(s): 2 > NUMA node(s): 2 > Vendor ID: Cavium > Model: 1 > Model name: ThunderX2 99xx > Stepping: 0x1 > BogoMIPS: 400.00 > L1d cache: 32K > L1i cache: 32K > L2 cache: 256K > L3 cache: 32768K > NUMA node0 CPU(s): 0-127 > NUMA node1 CPU(s): 128-255 > Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics cpuid > asimdrdm _______________________________________________ kexec mailing list kexec@xxxxxxxxxxxxxxxxxxx http://lists.infradead.org/mailman/listinfo/kexec