Sachin Sant <sachinp@xxxxxxxxxxxxxxxxxx> writes: > linux-next is currently broken on POWER8 non virtualized. Kernel > fails to reach login prompt with following kernel warning > repeatedly shown during boot. I don't see it on my test systems. The backtrace makes it look like you're doing CPU hot_un_plug during boot, which seems a bit odd. Or possibly it's just that the cpu_is_offline() test in do_idle() is returning true due to some bug. > The problem dates back atleast till next-20190816. A bisect would be helpful obviously :) > [ 40.285606] WARNING: CPU: 1 PID: 0 at arch/powerpc/platforms/powernv/smp.c:160 pnv_smp_cpu_kill_self+0x50/0x2d0 > [ 40.285609] Modules linked in: kvm_hv kvm sunrpc dm_mirror dm_region_hash dm_log dm_mod ses enclosure scsi_transport_sas sg ipmi_powernv ipmi_devintf powernv_rng uio_pdrv_genirq uio leds_powernv ipmi_msghandler powernv_op_panel ibmpowernv ip_tables ext4 mbcache jbd2 sd_mod ipr tg3 libata ptp pps_core > [ 40.285643] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.3.0-rc5-next-20190823-autotest-autotest #1 > [ 40.285644] NIP: c0000000000b5f40 LR: c000000000055498 CTR: c0000000000b5ef0 > [ 40.285646] REGS: c0000007f5527980 TRAP: 0700 Not tainted (5.3.0-rc5-next-20190823-autotest-autotest) > [ 40.285646] MSR: 9000000000029033 <SF,HV,EE,ME,IR,DR,RI,LE> CR: 24004028 XER: 00000000 > [ 40.285650] CFAR: c000000000055494 IRQMASK: 1 > [ 40.285650] GPR00: c000000000055498 c0000007f5527c10 c00000000148b200 0000000000000000 > [ 40.285650] GPR04: 0000000000000000 c0000007fa897d80 c0000007fa90c800 00000007f9980000 > [ 40.285650] GPR08: 0000000000000000 0000000000000001 0000000000000000 c0000007fa90c800 > [ 40.285650] GPR12: c0000000000b5ef0 c0000007ffffee00 0000000000000800 c000000ffffc11d0 > [ 40.285650] GPR16: 0000000000000001 c000000001035280 0000000000000000 c0000000015303c0 > [ 40.285650] GPR20: c000000000052d60 0000000000000001 c0000007f54cd800 c0000007f54cd880 > [ 40.285650] GPR24: 0000000000080000 c0000007f54cd800 c0000000014bdf78 c0000000014c20d8 > [ 40.285650] GPR28: 0000000000000002 c0000000014c2538 0000000000000001 c0000007f54cd800 > [ 40.285662] NIP [c0000000000b5f40] pnv_smp_cpu_kill_self+0x50/0x2d0 > [ 40.285664] LR [c000000000055498] cpu_die+0x48/0x64 > [ 40.285665] Call Trace: > [ 40.285667] [c0000007f5527c10] [c000000000f85f10] ppc64_tlb_batch+0x0/0x1220 (unreliable) > [ 40.285669] [c0000007f5527df0] [c000000000055498] cpu_die+0x48/0x64 > [ 40.285672] [c0000007f5527e10] [c0000000000226a0] arch_cpu_idle_dead+0x20/0x40 > [ 40.285674] [c0000007f5527e30] [c00000000016bd2c] do_idle+0x37c/0x3f0 > [ 40.285676] [c0000007f5527ed0] [c00000000016bfac] cpu_startup_entry+0x3c/0x50 > [ 40.285678] [c0000007f5527f00] [c000000000055198] start_secondary+0x638/0x680 > [ 40.285680] [c0000007f5527f90] [c00000000000ac5c] start_secondary_prolog+0x10/0x14 > [ 40.285680] Instruction dump: > [ 40.285681] fb61ffd8 fb81ffe0 fba1ffe8 fbc1fff0 fbe1fff8 f8010010 f821fe21 e90d1178 > [ 40.285684] f9010198 39000000 892d0988 792907e0 <0b090000> 39200002 7d210164 39200003 > [ 40.285687] ---[ end trace 72c90a064122d9e4 ]— That WARN shouldn't really kill the boot, do you see anything else? > Relevant code snippet : > 156 /* > 157 * This hard disables local interurpts, ensuring we have no lazy > 158 * irqs pending. > 159 */ > 160 WARN_ON(irqs_disabled()); <<=== > 161 hard_irq_disable(); > 162 WARN_ON(lazy_irq_pending()); Even via the path shown above I think we should have IRQs enabled, but I guess not. cheers