On Thu, Oct 5, 2017 at 10:21 AM, Abdul Haleem <abdhalee@xxxxxxxxxxxxxxxxxx> wrote: > Hi, > > CPU off on in a loop for single cpu results in kernel panic for > 4.14.0-rc2-next-20170929 > > Machine: Power 8 PowerVM LPAR > Kernel: 4.14.0-rc2-next-20170929 > gcc: 5.1.1 > config : attached > > Steps to recreate: > ----------------- > The issue is not reproducible all the time. > > The trace occurred when CPU toggle operation for cpu14 in a loop for 10 > iterations. > > the Faulting instruction address: 0xc00000000035465c > maps to: > > 0xc00000000035465c is in deactivate_slab (mm/slub.c:261). > 256 > 257 /* Returns the freelist pointer recorded at location ptr_addr. */ > 258 static inline void *freelist_dereference(const struct kmem_cache *s, > 259 void *ptr_addr) > 260 { > 261 return freelist_ptr(s, (void *)*(unsigned long *)(ptr_addr), > 262 (unsigned long)ptr_addr); > 263 } > 264 > 265 static inline void *get_freepointer(struct kmem_cache *s, void *object) This looks like slub cache corruption (a NULL pointer dereference for the heap freelist). -Kees > > dmesg logs: > ----------- > Unable to handle kernel paging request for data at address 0x00000042 > Faulting instruction address: 0xc00000000035465c > Oops: Kernel access of bad area, sig: 11 [#1] > LE SMP NR_CPUS=2048 NUMA pSeries > Dumping ftrace buffer: > (ftrace buffer empty) > Modules linked in: rpadlpar_io(E) rpaphp(E) xt_addrtype(E) xt_conntrack(E) ipt_MASQUERADE(E) nf_nat_masquerade_ipv4(E) iptable_nat(E) nf_conntrack_ipv4(E) nf_defrag_ipv4(E) nf_nat_ipv4(E) iptable_filter(E) ip_tables(E) x_tables(E) nf_nat(E) nf_conntrack(E) bridge(E) stp(E) llc(E) dm_thin_pool(E) dm_persistent_data(E) dm_bio_prison(E) dm_bufio(E) libcrc32c(E) vmx_crypto(E) rtc_generic(E) pseries_rng(E) autofs4(E) > CPU: 15 PID: 2687 Comm: sh Tainted: G E 4.14.0-rc2-next-20170929-autotest #1 > task: c000000772342e00 task.stack: c000000772488000 > NIP: c00000000035465c LR: c000000000354ef8 CTR: c000000000354e90 > REGS: c00000077ff6b8f0 TRAP: 0300 Tainted: G E (4.14.0-rc2-next-20170929-autotest) > MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE> CR: 28088822 XER: 20000000 > CFAR: c000000000008718 DAR: 0000000000000042 DSISR: 40000000 SOFTE: 0 > GPR00: c000000000354ef8 c00000077ff6bb70 c00000000159c600 c00000077e01f300 > GPR04: f000000001dcaec0 0000000000000010 000000000000006d 0000000000000001 > GPR08: 0000000001550000 0000000000000000 000000008155006d 00000000000000ff > GPR12: 0000000088008828 c00000000e749d80 0000000000000000 0000000000000000 > GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > GPR20: 0000000000000000 0000000000000000 0000000000000001 0000000000000002 > GPR24: c00000077e00fe40 0000000000000042 c000000772bb51c0 c00000077fee7d30 > GPR28: c00000077fee7d20 c00000077e01f300 c000000772bb51a8 f000000001dcaec0 > NIP [c00000000035465c] deactivate_slab.isra.22+0x19c/0x610 > LR [c000000000354ef8] flush_cpu_slab+0x68/0xc0 > Call Trace: > [c00000077ff6bb70] [c00000000035488c] deactivate_slab.isra.22+0x3cc/0x610 (unreliable) > [c00000077ff6bca0] [c000000000354ef8] flush_cpu_slab+0x68/0xc0 > [c00000077ff6bcd0] [c0000000001c7470] flush_smp_call_function_queue+0x120/0x1e0 > [c00000077ff6bd50] [c000000000048fac] smp_ipi_demux_relaxed+0x9c/0x110 > [c00000077ff6bd90] [c000000000093fc4] icp_hv_ipi_action+0x64/0xb0 > [c00000077ff6be00] [c000000000185a60] __handle_irq_event_percpu+0x90/0x2d0 > [c00000077ff6bec0] [c000000000185cdc] handle_irq_event_percpu+0x3c/0x90 > [c00000077ff6bf00] [c00000000018c894] handle_percpu_irq+0x84/0xd0 > [c00000077ff6bf30] [c0000000001840f4] generic_handle_irq+0x54/0x80 > [c00000077ff6bf60] [c000000000016f00] __do_irq+0x80/0x1d0 > [c00000077ff6bf90] [c00000000002b120] call_do_irq+0x14/0x24 > [c00000077248bde0] [c0000000000170e8] do_IRQ+0x98/0x140 > [c00000077248be30] [c000000000008ac4] hardware_interrupt_common+0x114/0x120 > Instruction dump: > b0df0018 60420000 815f0018 55490bfe 5529f83e 7d294378 913f0018 7c2004ac > e93f0000 792907a4 f93f0000 e93d0022 <7d59482a> 2faa0000 419e0054 7f3acb78 > ---[ end trace 1094995650f27c82 ]--- > > Kernel panic - not syncing: Fatal exception in interrupt > Dumping ftrace buffer: > (ftrace buffer empty) > ---[ end Kernel panic - not syncing: Fatal exception in interrupt > ------------[ cut here ]------------ > WARNING: CPU: 15 PID: 2687 at kernel/sched/core.c:1178 set_task_cpu+0x200/0x260 > Modules linked in: rpadlpar_io(E) rpaphp(E) xt_addrtype(E) xt_conntrack(E) ipt_MASQUERADE(E) nf_nat_masquerade_ipv4(E) iptable_nat(E) nf_conntrack_ipv4(E) nf_defrag_ipv4(E) nf_nat_ipv4(E) iptable_filter(E) ip_tables(E) x_tables(E) nf_nat(E) nf_conntrack(E) bridge(E) stp(E) llc(E) dm_thin_pool(E) dm_persistent_data(E) dm_bio_prison(E) dm_bufio(E) libcrc32c(E) vmx_crypto(E) rtc_generic(E) pseries_rng(E) autofs4(E) > CPU: 15 PID: 2687 Comm: sh Tainted: G D E 4.14.0-rc2-next-20170929-autotest #1 > task: c000000772342e00 task.stack: c000000772488000 > NIP: c0000000001429c0 LR: c000000000143674 CTR: c00000000014fc00 > REGS: c00000077ff6ac60 TRAP: 0700 Tainted: G D E (4.14.0-rc2-next-20170929-autotest) > MSR: 8000000000029033 <SF,EE,ME,IR,DR,RI,LE> CR: 28088884 XER: 00000002 > CFAR: c00000000014286c SOFTE: 0 > GPR00: c000000000143674 c00000077ff6aee0 c00000000159c600 c000000775159e00 > GPR04: 0000000000000000 0000000000000000 0000000000000001 0000000000000001 > GPR08: 0000000000000000 0000000000008000 c0000000015d1ed0 0000000000000002 > GPR12: 0000000028088224 c00000000e749d80 0000000000000000 0000000000000000 > GPR16: 0000000000000000 0000000000000000 0000000000000000 000000077ee10000 > GPR20: c00000077fed15c0 0000000000000000 0000000000000000 0000000000000000 > GPR24: c0000000015cdd78 c00000077515a238 c0000000010d6680 0000000000000000 > GPR28: 0000000000000000 0000000000000004 0000000000000000 c000000775159e00 > NIP [c0000000001429c0] set_task_cpu+0x200/0x260 > LR [c000000000143674] try_to_wake_up+0x1d4/0x5d0 > Call Trace: > [c00000077ff6aee0] [c0000000010d6680] runqueues+0x0/0xb00 (unreliable) > [c00000077ff6af20] [c000000000143674] try_to_wake_up+0x1d4/0x5d0 > [c00000077ff6afa0] [c0000000001674d4] autoremove_wake_function+0x44/0x80 > [c00000077ff6aff0] [c000000000166824] __wake_up_common+0xe4/0x1e0 > [c00000077ff6b060] [c0000000001669dc] __wake_up_common_lock+0xbc/0x110 > [c00000077ff6b0f0] [c000000000182f90] wake_up_klogd_work_func+0x60/0xa0 > [c00000077ff6b120] [c00000000026c7d0] irq_work_run_list+0xb0/0x100 > [c00000077ff6b170] [c0000000001a8450] update_process_times+0x60/0x90 > [c00000077ff6b1a0] [c0000000001c080c] tick_sched_handle.isra.5+0x5c/0xa0 > [c00000077ff6b1d0] [c0000000001c08b0] tick_sched_timer+0x60/0xe0 > [c00000077ff6b210] [c0000000001a9058] __hrtimer_run_queues+0xf8/0x360 > [c00000077ff6b290] [c0000000001a9ffc] hrtimer_interrupt+0xfc/0x350 > [c00000077ff6b360] [c000000000025324] __timer_interrupt+0x94/0x270 > [c00000077ff6b3b0] [c000000000025764] timer_interrupt+0xa4/0x110 > [c00000077ff6b3f0] [c000000000008f74] decrementer_common+0x114/0x120 > --- interrupt: 901 at replay_interrupt_return+0x0/0x4 > LR = arch_local_irq_restore+0x74/0x90 > [c00000077ff6b6e0] [c000000000e7fc00] num_spec.61222+0x178bbc/0x2287cc (unreliable) > [c00000077ff6b700] [c000000000100d44] panic+0x2b0/0x30c > [c00000077ff6b790] [c000000000026428] oops_end+0x1b8/0x1e0 > [c00000077ff6b810] [c000000000067d30] bad_page_fault+0xe0/0x160 > [c00000077ff6b880] [c00000000000a4c0] handle_page_fault+0x34/0x38 > --- interrupt: 300 at deactivate_slab.isra.22+0x19c/0x610 > LR = flush_cpu_slab+0x68/0xc0 > [c00000077ff6bb70] [c00000000035488c] deactivate_slab.isra.22+0x3cc/0x610 (unreliable) > [c00000077ff6bca0] [c000000000354ef8] flush_cpu_slab+0x68/0xc0 > [c00000077ff6bcd0] [c0000000001c7470] flush_smp_call_function_queue+0x120/0x1e0 > [c00000077ff6bd50] [c000000000048fac] smp_ipi_demux_relaxed+0x9c/0x110 > [c00000077ff6bd90] [c000000000093fc4] icp_hv_ipi_action+0x64/0xb0 > [c00000077ff6be00] [c000000000185a60] __handle_irq_event_percpu+0x90/0x2d0 > [c00000077ff6bec0] [c000000000185cdc] handle_irq_event_percpu+0x3c/0x90 > [c00000077ff6bf00] [c00000000018c894] handle_percpu_irq+0x84/0xd0 > [c00000077ff6bf30] [c0000000001840f4] generic_handle_irq+0x54/0x80 > [c00000077ff6bf60] [c000000000016f00] __do_irq+0x80/0x1d0 > [c00000077ff6bf90] [c00000000002b120] call_do_irq+0x14/0x24 > [c00000077248bde0] [c0000000000170e8] do_IRQ+0x98/0x140 > [c00000077248be30] [c000000000008ac4] hardware_interrupt_common+0x114/0x120 > Instruction dump: > e93d0019 2fa90000 409effd8 4bfffed8 893f0644 61290004 993f0644 4bffff10 > 0fe00000 4bfffe6c 60000000 60420000 <0fe00000> 4bfffeac 60000000 60420000 > ---[ end trace 1094995650f27c83 ]--- > > > > -- > Regard's > > Abdul Haleem > IBM Linux Technology Centre > > -- Kees Cook Pixel Security -- To unsubscribe from this list: send the line "unsubscribe linux-next" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html