Re: [powerpc/powervm]kernel BUG at mm/memory_hotplug.c:1864!

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2018-06-26 20:24, Nathan Fontenot wrote:
On 06/12/2018 05:28 AM, Balbir Singh wrote:


On 11/06/18 17:41, vrbagal1 wrote:
On 2018-06-08 17:45, Oscar Salvador wrote:
On Fri, Jun 08, 2018 at 05:11:24PM +0530, vrbagal1 wrote:
On 2018-06-08 16:58, Oscar Salvador wrote:
On Fri, Jun 08, 2018 at 04:44:24PM +0530, vrbagal1 wrote:
Greetings!!!

I am seeing kernel bug followed by oops message and system reboots,
while
running dlpar memory hotplug test.

Machine Details: Power6 PowerVM Platform
GCC version: (gcc version 4.8.3 20140911 (Red Hat 4.8.3-7) (GCC))
Test case: dlpar memory hotplug test (https://github.com/avocado-framework-tests/avocado-misc-tests/blob/master/memory/memhotplug.py)
Kernel Version: Linux version 4.17.0-autotest

I am seeing this bug on rc7 as well.

Observing similar traces on linux next kernel: 4.17.0-next-20180608-autotest

 Block size [0x4000000] unaligned hotplug range: start 0x220000000, size 0x1000000

size < block_size in this case, why? how? Could you confirm that the block size is 64MB and your trying to remove 16MB


I was not able to re-create this failure exactly ( I don't have a Power6 system) but was able to get a similar re-create on a Power 9 with a few modifications.

I think the issue you're seeing is due to a change in the validation of memory done in remove_memory to ensure the amount of memory being removed spans
entire memory block. The pseries memory remove code, see
pseries_remove_memblock,
tries to remove each section of a memory block instead of the entire
memory block.

Could you try the patch below that updates the pseries code to remove the entire
memory block instead of doing it one section at a time.

-Nathan


Hi Nathan,

With below patch applied on 4.18.0-rc2 I am seeing below oops message.

------------[ cut here ]------------
kernel BUG at mm/memory_hotplug.c:150!
Oops: Exception in kernel mode, sig: 5 [#1]
BE SMP NR_CPUS=1024 NUMA pSeries
Modules linked in: rpadlpar_io rpaphp nf_conntrack_netbios_ns nf_conntrack_broadcast ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 ipt_REJECT cfg80211 nf_reject_ipv4 nf_conntrack_ipv4 nf_defrag_ipv4 rfkill xt_conntrack nf_conntrack libcrc32c ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_mangle iptable_security iptable_raw iptable_filter ip_tables ses osst enclosure scsi_transport_sas ehea st uio_pdrv_genirq uio nfsd auth_rpcgss nfs_acl lockd grace sunrpc ipv6 crc_ccitt ext4 mbcache jbd2 sd_mod sr_mod cdrom dm_mirror dm_region_hash dm_log dm_mod dax CPU: 5 PID: 2925 Comm: drmgr Tainted: G W 4.18.0-rc2-00045-g671afc8 #2
NIP:  c0000000002cf278 LR: c0000000002c0c38 CTR: 0000000000000400
REGS: c0000002ac4ab150 TRAP: 0700 Tainted: G W (4.18.0-rc2-00045-g671afc8)
MSR:  8000000000029032 <SF,EE,ME,IR,DR,RI>  CR: 28002884  XER: 00000000
CFAR: c0000000002c0c00 IRQMASK: 0
GPR00: c0000000002c0c38 c0000002ac4ab3d0 c000000001159b00 c0000002b1091810 GPR04: 0000000000000000 0000000000000000 0000000000000000 0000000000002b10 GPR08: c0000002b3fd0600 0000000000000001 0000000000000000 0000000000000220 GPR12: 0000000088002884 c00000000eeaa000 000000000002b400 0000000000024d00 GPR16: c0000002b3f8ca00 0000000000024c00 c0000000d3fc89c0 0000000000024d00 GPR20: 0000000000000003 0000000000000004 c0000002b3f7ca8c 0000000000000000 GPR24: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR28: c0000002b3fd0600 c0000002b1f7c6c0 c0000002b3f86224 c0000002b1091810
NIP [c0000000002cf278] .put_page_bootmem+0x28/0xf0
LR [c0000000002c0c38] .sparse_remove_one_section+0x228/0x2c0
Call Trace:
[c0000002ac4ab3d0] [c0000002ac4ab450] 0xc0000002ac4ab450 (unreliable)
[c0000002ac4ab450] [c0000000002c0c38] .sparse_remove_one_section+0x228/0x2c0
[c0000002ac4ab4f0] [c0000000002cf6f8] .__remove_pages+0x3b8/0x550
[c0000002ac4ab600] [c0000000008d32a4] .arch_remove_memory+0xb4/0x128
[c0000002ac4ab680] [c0000000002d1cd0] .remove_memory+0xb0/0x100
[c0000002ac4ab710] [c0000000000bc7b4] .pseries_remove_memblock+0x94/0xe0
[c0000002ac4ab790] [c0000000000bd3f8] .pseries_memory_notifier+0x248/0x260
[c0000002ac4ab820] [c000000000116ee8] .notifier_call_chain+0x78/0xf0
[c0000002ac4ab8c0] [c000000000117358] .__blocking_notifier_call_chain+0x58/0x90
[c0000002ac4ab960] [c000000000743e30] .of_property_notify+0x90/0xd0
[c0000002ac4aba10] [c00000000073ed04] .of_update_property+0x104/0x150
[c0000002ac4abac0] [c0000000000b045c] .ofdt_write+0x3bc/0x6f0
[c0000002ac4abb90] [c0000000003735b8] .proc_reg_write+0x78/0xc0
[c0000002ac4abc10] [c0000000002deaac] .__vfs_write+0x3c/0x200
[c0000002ac4abcf0] [c0000000002deeb0] .vfs_write+0xc0/0x230
[c0000002ac4abd90] [c0000000002df214] .ksys_write+0x54/0x100
[c0000002ac4abe30] [c00000000000b9dc] system_call+0x5c/0x70
Instruction dump:
60000000 60000000 7c0802a6 fbe1fff8 7c7f1b78 f8010010 f821ff81 e9230020
3929fff4 21290002 7d294910 7d2900d0 <0b090000> 7c0004ac 39230034 7d404828
---[ end trace 85b846899f1bdbb7 ]---


Regards,
Venkat.


---

arch/powerpc/platforms/pseries/hotplug-memory.c | 18 ++++++------------
 1 file changed, 6 insertions(+), 12 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c
b/arch/powerpc/platforms/pseries/hotplug-memory.c
index c1578f54c626..6072efc793e1 100644
--- a/arch/powerpc/platforms/pseries/hotplug-memory.c
+++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
@@ -316,11 +316,11 @@ static int dlpar_offline_lmb(struct drmem_lmb *lmb)
 	return dlpar_change_lmb_state(lmb, false);
 }

-static int pseries_remove_memblock(unsigned long base, unsigned int
memblock_size)
+static int pseries_remove_memblock(unsigned long base,
+				   unsigned int memblock_sz)
 {
-	unsigned long block_sz, start_pfn;
-	int sections_per_block;
-	int i, nid;
+	unsigned long start_pfn;
+	int nid;

 	start_pfn = base >> PAGE_SHIFT;

@@ -329,18 +329,12 @@ static int pseries_remove_memblock(unsigned long
base, unsigned int memblock_siz
 	if (!pfn_valid(start_pfn))
 		goto out;

-	block_sz = pseries_memory_block_size();
-	sections_per_block = block_sz / MIN_MEMORY_BLOCK_SIZE;
 	nid = memory_add_physaddr_to_nid(base);
-
-	for (i = 0; i < sections_per_block; i++) {
-		remove_memory(nid, base, MIN_MEMORY_BLOCK_SIZE);
-		base += MIN_MEMORY_BLOCK_SIZE;
-	}
+	remove_memory(nid, base, memblock_sz);

 out:
 	/* Update memory regions for memory remove */
-	memblock_remove(base, memblock_size);
+	memblock_remove(base, memblock_sz);
 	unlock_device_hotplug();
 	return 0;
 }

--
To unsubscribe from this list: send the line "unsubscribe linux-next" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Kernel]     [Linux USB Development]     [Yosemite News]     [Linux SCSI]

  Powered by Linux