On 2018-06-26 20:24, Nathan Fontenot wrote:
On 06/12/2018 05:28 AM, Balbir Singh wrote:
On 11/06/18 17:41, vrbagal1 wrote:
On 2018-06-08 17:45, Oscar Salvador wrote:
On Fri, Jun 08, 2018 at 05:11:24PM +0530, vrbagal1 wrote:
On 2018-06-08 16:58, Oscar Salvador wrote:
On Fri, Jun 08, 2018 at 04:44:24PM +0530, vrbagal1 wrote:
Greetings!!!
I am seeing kernel bug followed by oops message and system
reboots,
while
running dlpar memory hotplug test.
Machine Details: Power6 PowerVM Platform
GCC version: (gcc version 4.8.3 20140911 (Red Hat 4.8.3-7) (GCC))
Test case: dlpar memory hotplug test
(https://github.com/avocado-framework-tests/avocado-misc-tests/blob/master/memory/memhotplug.py)
Kernel Version: Linux version 4.17.0-autotest
I am seeing this bug on rc7 as well.
Observing similar traces on linux next kernel:
4.17.0-next-20180608-autotest
Block size [0x4000000] unaligned hotplug range: start 0x220000000,
size 0x1000000
size < block_size in this case, why? how? Could you confirm that the
block size is 64MB and your trying to remove 16MB
I was not able to re-create this failure exactly ( I don't have a
Power6 system)
but was able to get a similar re-create on a Power 9 with a few
modifications.
I think the issue you're seeing is due to a change in the validation of
memory
done in remove_memory to ensure the amount of memory being removed
spans
entire memory block. The pseries memory remove code, see
pseries_remove_memblock,
tries to remove each section of a memory block instead of the entire
memory block.
Could you try the patch below that updates the pseries code to remove
the entire
memory block instead of doing it one section at a time.
-Nathan
Hi Nathan,
With below patch applied on 4.18.0-rc2 I am seeing below oops message.
------------[ cut here ]------------
kernel BUG at mm/memory_hotplug.c:150!
Oops: Exception in kernel mode, sig: 5 [#1]
BE SMP NR_CPUS=1024 NUMA pSeries
Modules linked in: rpadlpar_io rpaphp nf_conntrack_netbios_ns
nf_conntrack_broadcast ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6
nf_defrag_ipv6 ipt_REJECT cfg80211 nf_reject_ipv4 nf_conntrack_ipv4
nf_defrag_ipv4 rfkill xt_conntrack nf_conntrack libcrc32c ebtable_nat
ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_mangle
ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_mangle
iptable_security iptable_raw iptable_filter ip_tables ses osst enclosure
scsi_transport_sas ehea st uio_pdrv_genirq uio nfsd auth_rpcgss nfs_acl
lockd grace sunrpc ipv6 crc_ccitt ext4 mbcache jbd2 sd_mod sr_mod cdrom
dm_mirror dm_region_hash dm_log dm_mod dax
CPU: 5 PID: 2925 Comm: drmgr Tainted: G W
4.18.0-rc2-00045-g671afc8 #2
NIP: c0000000002cf278 LR: c0000000002c0c38 CTR: 0000000000000400
REGS: c0000002ac4ab150 TRAP: 0700 Tainted: G W
(4.18.0-rc2-00045-g671afc8)
MSR: 8000000000029032 <SF,EE,ME,IR,DR,RI> CR: 28002884 XER: 00000000
CFAR: c0000000002c0c00 IRQMASK: 0
GPR00: c0000000002c0c38 c0000002ac4ab3d0 c000000001159b00
c0000002b1091810
GPR04: 0000000000000000 0000000000000000 0000000000000000
0000000000002b10
GPR08: c0000002b3fd0600 0000000000000001 0000000000000000
0000000000000220
GPR12: 0000000088002884 c00000000eeaa000 000000000002b400
0000000000024d00
GPR16: c0000002b3f8ca00 0000000000024c00 c0000000d3fc89c0
0000000000024d00
GPR20: 0000000000000003 0000000000000004 c0000002b3f7ca8c
0000000000000000
GPR24: 0000000000000000 0000000000000000 0000000000000000
0000000000000000
GPR28: c0000002b3fd0600 c0000002b1f7c6c0 c0000002b3f86224
c0000002b1091810
NIP [c0000000002cf278] .put_page_bootmem+0x28/0xf0
LR [c0000000002c0c38] .sparse_remove_one_section+0x228/0x2c0
Call Trace:
[c0000002ac4ab3d0] [c0000002ac4ab450] 0xc0000002ac4ab450 (unreliable)
[c0000002ac4ab450] [c0000000002c0c38]
.sparse_remove_one_section+0x228/0x2c0
[c0000002ac4ab4f0] [c0000000002cf6f8] .__remove_pages+0x3b8/0x550
[c0000002ac4ab600] [c0000000008d32a4] .arch_remove_memory+0xb4/0x128
[c0000002ac4ab680] [c0000000002d1cd0] .remove_memory+0xb0/0x100
[c0000002ac4ab710] [c0000000000bc7b4] .pseries_remove_memblock+0x94/0xe0
[c0000002ac4ab790] [c0000000000bd3f8]
.pseries_memory_notifier+0x248/0x260
[c0000002ac4ab820] [c000000000116ee8] .notifier_call_chain+0x78/0xf0
[c0000002ac4ab8c0] [c000000000117358]
.__blocking_notifier_call_chain+0x58/0x90
[c0000002ac4ab960] [c000000000743e30] .of_property_notify+0x90/0xd0
[c0000002ac4aba10] [c00000000073ed04] .of_update_property+0x104/0x150
[c0000002ac4abac0] [c0000000000b045c] .ofdt_write+0x3bc/0x6f0
[c0000002ac4abb90] [c0000000003735b8] .proc_reg_write+0x78/0xc0
[c0000002ac4abc10] [c0000000002deaac] .__vfs_write+0x3c/0x200
[c0000002ac4abcf0] [c0000000002deeb0] .vfs_write+0xc0/0x230
[c0000002ac4abd90] [c0000000002df214] .ksys_write+0x54/0x100
[c0000002ac4abe30] [c00000000000b9dc] system_call+0x5c/0x70
Instruction dump:
60000000 60000000 7c0802a6 fbe1fff8 7c7f1b78 f8010010 f821ff81 e9230020
3929fff4 21290002 7d294910 7d2900d0 <0b090000> 7c0004ac 39230034
7d404828
---[ end trace 85b846899f1bdbb7 ]---
Regards,
Venkat.
---
arch/powerpc/platforms/pseries/hotplug-memory.c | 18
++++++------------
1 file changed, 6 insertions(+), 12 deletions(-)
diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c
b/arch/powerpc/platforms/pseries/hotplug-memory.c
index c1578f54c626..6072efc793e1 100644
--- a/arch/powerpc/platforms/pseries/hotplug-memory.c
+++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
@@ -316,11 +316,11 @@ static int dlpar_offline_lmb(struct drmem_lmb
*lmb)
return dlpar_change_lmb_state(lmb, false);
}
-static int pseries_remove_memblock(unsigned long base, unsigned int
memblock_size)
+static int pseries_remove_memblock(unsigned long base,
+ unsigned int memblock_sz)
{
- unsigned long block_sz, start_pfn;
- int sections_per_block;
- int i, nid;
+ unsigned long start_pfn;
+ int nid;
start_pfn = base >> PAGE_SHIFT;
@@ -329,18 +329,12 @@ static int pseries_remove_memblock(unsigned long
base, unsigned int memblock_siz
if (!pfn_valid(start_pfn))
goto out;
- block_sz = pseries_memory_block_size();
- sections_per_block = block_sz / MIN_MEMORY_BLOCK_SIZE;
nid = memory_add_physaddr_to_nid(base);
-
- for (i = 0; i < sections_per_block; i++) {
- remove_memory(nid, base, MIN_MEMORY_BLOCK_SIZE);
- base += MIN_MEMORY_BLOCK_SIZE;
- }
+ remove_memory(nid, base, memblock_sz);
out:
/* Update memory regions for memory remove */
- memblock_remove(base, memblock_size);
+ memblock_remove(base, memblock_sz);
unlock_device_hotplug();
return 0;
}
--
To unsubscribe from this list: send the line "unsubscribe linux-next" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html