> On 26-Aug-2019, at 8:59 AM, Michael Ellerman <mpe@xxxxxxxxxxxxxx> wrote: > > Sachin Sant <sachinp@xxxxxxxxxxxxxxxxxx> writes: >> linux-next is currently broken on POWER8 non virtualized. Kernel >> fails to reach login prompt with following kernel warning >> repeatedly shown during boot. > > I don't see it on my test systems. > > The backtrace makes it look like you're doing CPU hot_un_plug during > boot, which seems a bit odd. > There is no explicit hot un plug operation being done. This happens during boot. For some reason cpu’s are being off lined. I had earlier reported that kernel does not boot till login prompt. I was wrong. Kernel does boot. Not surr if it’s a side effect of these warnings, SMT is off after the boot. # lscpu Architecture: ppc64le Byte Order: Little Endian CPU(s): 80 On-line CPU(s) list: 0,8,16,24,32,40,48,56,64,72 Off-line CPU(s) list: 1-7,9-15,17-23,25-31,33-39,41-47,49-55,57-63,65-71,73-79 Thread(s) per core: 1 Core(s) per socket: 5 …….. NUMA node0 CPU(s): 0,8,16,24,32 NUMA node1 CPU(s): 40,48,56,64,72 # # ppc64_cpu --smt SMT is off # I can manually turn on the SMT. > Or possibly it's just that the cpu_is_offline() test in do_idle() is > returning true due to some bug. > >> The problem dates back atleast till next-20190816. > > A bisect would be helpful obviously :) Last successful kernel boot was with next-20190808. Will attempt a bisect. Started failing with 9th Aug tree. > >> [ 40.285606] WARNING: CPU: 1 PID: 0 at arch/powerpc/platforms/powernv/smp.c:160 pnv_smp_cpu_kill_self+0x50/0x2d0 >> [ 40.285609] Modules linked in: kvm_hv kvm sunrpc dm_mirror dm_region_hash dm_log dm_mod ses enclosure scsi_transport_sas sg ipmi_powernv ipmi_devintf powernv_rng uio_pdrv_genirq uio leds_powernv ipmi_msghandler powernv_op_panel ibmpowernv ip_tables ext4 mbcache jbd2 sd_mod ipr tg3 libata ptp pps_core >> [ 40.285643] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.3.0-rc5-next-20190823-autotest-autotest #1 >> [ 40.285644] NIP: c0000000000b5f40 LR: c000000000055498 CTR: c0000000000b5ef0 >> [ 40.285646] REGS: c0000007f5527980 TRAP: 0700 Not tainted (5.3.0-rc5-next-20190823-autotest-autotest) >> [ 40.285646] MSR: 9000000000029033 <SF,HV,EE,ME,IR,DR,RI,LE> CR: 24004028 XER: 00000000 >> [ 40.285650] CFAR: c000000000055494 IRQMASK: 1 >> [ 40.285650] GPR00: c000000000055498 c0000007f5527c10 c00000000148b200 0000000000000000 >> [ 40.285650] GPR04: 0000000000000000 c0000007fa897d80 c0000007fa90c800 00000007f9980000 >> [ 40.285650] GPR08: 0000000000000000 0000000000000001 0000000000000000 c0000007fa90c800 >> [ 40.285650] GPR12: c0000000000b5ef0 c0000007ffffee00 0000000000000800 c000000ffffc11d0 >> [ 40.285650] GPR16: 0000000000000001 c000000001035280 0000000000000000 c0000000015303c0 >> [ 40.285650] GPR20: c000000000052d60 0000000000000001 c0000007f54cd800 c0000007f54cd880 >> [ 40.285650] GPR24: 0000000000080000 c0000007f54cd800 c0000000014bdf78 c0000000014c20d8 >> [ 40.285650] GPR28: 0000000000000002 c0000000014c2538 0000000000000001 c0000007f54cd800 >> [ 40.285662] NIP [c0000000000b5f40] pnv_smp_cpu_kill_self+0x50/0x2d0 >> [ 40.285664] LR [c000000000055498] cpu_die+0x48/0x64 >> [ 40.285665] Call Trace: >> [ 40.285667] [c0000007f5527c10] [c000000000f85f10] ppc64_tlb_batch+0x0/0x1220 (unreliable) >> [ 40.285669] [c0000007f5527df0] [c000000000055498] cpu_die+0x48/0x64 >> [ 40.285672] [c0000007f5527e10] [c0000000000226a0] arch_cpu_idle_dead+0x20/0x40 >> [ 40.285674] [c0000007f5527e30] [c00000000016bd2c] do_idle+0x37c/0x3f0 >> [ 40.285676] [c0000007f5527ed0] [c00000000016bfac] cpu_startup_entry+0x3c/0x50 >> [ 40.285678] [c0000007f5527f00] [c000000000055198] start_secondary+0x638/0x680 >> [ 40.285680] [c0000007f5527f90] [c00000000000ac5c] start_secondary_prolog+0x10/0x14 >> [ 40.285680] Instruction dump: >> [ 40.285681] fb61ffd8 fb81ffe0 fba1ffe8 fbc1fff0 fbe1fff8 f8010010 f821fe21 e90d1178 >> [ 40.285684] f9010198 39000000 892d0988 792907e0 <0b090000> 39200002 7d210164 39200003 >> [ 40.285687] ---[ end trace 72c90a064122d9e4 ]— > > That WARN shouldn't really kill the boot, do you see anything else? The machine actually boots till login prompt. I have attached the boot log(5.3.0-rc4-next-20190814) Thanks -Sachin
Attachment:
boot.log
Description: Binary data