This is likely a legitimate bug: something took the kref object negative. (Which was noticed due to the recent migration of kref from atomic_t to refcount_t which will refuse to perform dangerous refcounting actions.) If I had to guess, I think it's dlpar_cpu_exists(), which is calling of_node_put() on the child. I don't think that should be happening, but I'm not actually familiar with this code. :) -Kees On Mon, Feb 27, 2017 at 1:35 AM, Sachin Sant <sachinp@xxxxxxxxxxxxxxxxxx> wrote: > With Feb 27 next tree I am seeing inconsistent results on a CPU remove > DLPAR operation on a POWER8 LPAR. > > After the cpu remove operation the SMT capability of the LPAR is disabled. > > # uname -r > 4.10.0-next-20170227 > # ppc64_cpu --smt > SMT=8 > # lscpu > Architecture: ppc64le > Byte Order: Little Endian > CPU(s): 16 > On-line CPU(s) list: 0-15 > Thread(s) per core: 8 > Core(s) per socket: 1 > Socket(s): 2 > NUMA node(s): 4 > Model: 2.1 (pvr 004b 0201) > Model name: POWER8 (architected), altivec supported > L1d cache: 64K > L1i cache: 32K > L2 cache: 512K > L3 cache: 8192K > NUMA node0 CPU(s): > NUMA node1 CPU(s): 0-7 > NUMA node3 CPU(s): > NUMA node4 CPU(s): 8-15 > > After a DLPAR operation (CPU remove : 2 to 1) all the cpu seems to be > removed. at the end of it I also see a warning @lib/refcount.c:128 > SMT capability is show as disabled. It should have remained at 8. > > # ppc64_cpu —smt > Machine is not SMT capable > lscpu o/p shows 8 online cpus, with threads per core as 8. > > [root@alp12 ~]# lscpu > Architecture: ppc64le > Byte Order: Little Endian > CPU(s): 8 > On-line CPU(s) list: 8-15 > Thread(s) per core: 8 > Core(s) per socket: 1 > Socket(s): 1 > NUMA node(s): 4 > Model: 2.1 (pvr 004b 0201) > Model name: POWER8 (architected), altivec supported > L1d cache: 64K > L1i cache: 32K > NUMA node0 CPU(s): > NUMA node1 CPU(s): > NUMA node3 CPU(s): > NUMA node4 CPU(s): 8-15 > [root@alp12 ~] > > [ 196.910677] cpu 8 (hwid 8) Ready to die... > [ 197.120324] cpu 9 (hwid 9) Ready to die... > [ 197.290265] cpu 10 (hwid 10) Ready to die... > [ 197.490234] cpu 11 (hwid 11) Ready to die... > [ 197.630110] cpu 12 (hwid 12) Ready to die... > [ 197.790094] cpu 13 (hwid 13) Ready to die... > [ 197.980016] cpu 14 (hwid 14) Ready to die... > [ 198.098137] cpu 15 (hwid 15) Ready to die... > [ 198.210074] pseries-hotplug-cpu: Failed to release drc (10000008) for CPU PowerPC,POWER8, rc: -17 > [ 199.050648] cpu 0 (hwid 0) Ready to die... > [ 199.220530] cpu 1 (hwid 1) Ready to die... > [ 199.370459] cpu 2 (hwid 2) Ready to die... > [ 199.600322] cpu 3 (hwid 3) Ready to die... > [ 199.770259] cpu 4 (hwid 4) Ready to die... > [ 199.960189] cpu 5 (hwid 5) Ready to die... > [ 200.140145] cpu 6 (hwid 6) Ready to die... > [ 200.258067] cpu 7 (hwid 7) Ready to die... > [ 200.360320] refcount_t: underflow; use-after-free. > [ 200.360371] ------------[ cut here ]------------ > [ 200.360385] WARNING: CPU: 10 PID: 7194 at lib/refcount.c:128 refcount_sub_and_test+0xb8/0xf0 > [ 200.360398] Modules linked in: iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp rpadlpar_io rpaphp tun bridge stp llc kvm iptable_filter vmx_crypto pseries_rng rng_core binfmt_misc nfsd ip_tables x_tables autofs4 > [ 200.360472] CPU: 10 PID: 7194 Comm: drmgr Tainted: G W 4.10.0-next-20170227 #3 > [ 200.360478] task: c0000008b7222b00 task.stack: c0000008b72dc000 > [ 200.360483] NIP: c000000001b6b4b8 LR: c000000001b6b4b4 CTR: c000000001cefb50 > [ 200.360488] REGS: c0000008b72df860 TRAP: 0700 Tainted: G W (4.10.0-next-20170227) > [ 200.360494] MSR: 8000000000029033 <SF,EE,ME,IR,DR,RI,LE> > [ 200.360506] CR: 22000422 XER: 00000007 > [ 200.360511] CFAR: c000000001faf738 SOFTE: 1 > [ 200.360511] GPR00: c000000001b6b4b4 c0000008b72dfae0 c00000000266c300 0000000000000026 > [ 200.360511] GPR04: c00000050fd8adb0 c00000050fda1660 0000000000419000 000000000000ff00 > [ 200.360511] GPR08: 0000000000000000 c00000000235143c 000000050da40000 00000000000001d7 > [ 200.360511] GPR12: 0000000000000000 c00000000ea82800 0000000000000000 0000000000000000 > [ 200.360511] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > [ 200.360511] GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > [ 200.360511] GPR24: 0000000000000000 0000000010018430 c0000005dd05f520 c0000008b72dfe00 > [ 200.360511] GPR28: 0000000000000000 0000000000000016 0000000000000000 c0000008b71ffa18 > [ 200.360570] NIP [c000000001b6b4b8] refcount_sub_and_test+0xb8/0xf0 > [ 200.360575] LR [c000000001b6b4b4] refcount_sub_and_test+0xb4/0xf0 > [ 200.360578] Call Trace: > [ 200.360582] [c0000008b72dfae0] [c000000001b6b4b4] refcount_sub_and_test+0xb4/0xf0 (unreliable) > [ 200.360588] [c0000008b72dfb40] [c000000001b4b0dc] kobject_put+0x3c/0xa0 > [ 200.360595] [c0000008b72dfbb0] [c000000001e53bf4] of_node_put+0x24/0x40 > [ 200.360602] [c0000008b72dfbd0] [c00000000165b4f4] dlpar_cpu_release+0x74/0xf0 > [ 200.360608] [c0000008b72dfc20] [c0000000015e0e28] arch_cpu_release+0x38/0x70 > [ 200.360615] [c0000008b72dfc40] [c000000001c49eb0] cpu_release_store+0x40/0x70 > [ 200.360622] [c0000008b72dfc70] [c000000001c3d994] dev_attr_store+0x34/0x60 > [ 200.360629] [c0000008b72dfc90] [c00000000191bc44] sysfs_kf_write+0x64/0xa0 > [ 200.360634] [c0000008b72dfcb0] [c00000000191aa80] kernfs_fop_write+0x170/0x250 > [ 200.360641] [c0000008b72dfd00] [c00000000187c330] __vfs_write+0x40/0x1c0 > [ 200.360645] [c0000008b72dfd90] [c00000000187dc48] vfs_write+0xc8/0x240 > [ 200.360650] [c0000008b72dfde0] [c00000000187f8b0] SyS_write+0x60/0x110 > [ 200.360656] [c0000008b72dfe30] [c0000000015cb8e0] system_call+0x38/0xfc > [ 200.360660] Instruction dump: > [ 200.360663] 7d495378 419e0044 2f89ffff 7d434850 7f0a4840 79460020 41de001c 4099ffbc > [ 200.360675] 3c62ffb6 38636af8 48444249 60000000 <0fe00000> 38210060 38600000 e8010010 > [ 200.360686] ---[ end trace 937482186422ac36 ]--- > > I have attached the dmesg log. > > Thanks > -Sachin > > > -- Kees Cook Pixel Security -- To unsubscribe from this list: send the line "unsubscribe linux-next" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html