Re: [PATCH] parisc: Switch from DISCONTIGMEM to SPARSEMEM

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



The stack pointer contains fffffff0f0000018:
24-27   000000000400c240  fffffffffffa2000  fffffff0f0000018  fffffff0f0412000

I presume that it is initialized incorrectly:

[    0.000000] Memory Ranges:
[    0.000000]  0) Start 0x0000000000000000 End 0x00000000efffffff Size   3840 MB
[    0.000000]  1) Start 0x00000010f0000000 End 0x00000010ffffffff Size    256 MB

Dave

On 2019-04-15 3:52 p.m., Sven Schnelle wrote:
> Hi,
>
> On Wed, Apr 10, 2019 at 07:39:11PM +0200, Helge Deller wrote:
>> The commit 1c30844d2dfe ("mm: reclaim small amounts of memory when an
>> external fragmentation event occurs") breaks memory management on a
>> parisc c8000 workstation with this memory layout:
>>
>> 	0) Start 0x0000000000000000 End 0x000000003fffffff Size   1024 MB
>> 	1) Start 0x0000000100000000 End 0x00000001bfdfffff Size   3070 MB
>> 	2) Start 0x0000004040000000 End 0x00000040ffffffff Size   3072 MB
>>
>> With the patch 1c30844d2dfe, the kernel will incorrectly reclaim the
>> first zone when it fills up, ignoring the fact that there are two
>> completely free zones. Basiscally, it limits cache size to 1GiB.
>>
>> The parisc kernel is currently using the DISCONTIGMEM implementation,
>> but isn't NUMA. Avoid this issue and strange work-arounds by switching
>> to the more commonly used SPARSEMEM implementation.
>> [..]
> unfortunately this patch breaks booting on my J5000. The second CPU fails
> to start, and triggers a HPMC (Bus timeout). Running with this patch adding
> the nosmp command line option works. On my C3750 there's no problem.
>
> Here's the dmesg:
>
> [    0.000000] Linux version 5.1.0-rc3-64bit+ (svens@t470p) (gcc version 7.4.0 (GCC)) #259 SMP Mon Apr 15 20:57:57 CEST 2019
> [    0.000000] CPU0: thread -1, cpu 0, socket 0
> [    0.000000] FP[0] enabled: Rev 1 Model 16
> [    0.000000] The 64-bit Kernel has started...
> [    0.000000] Kernel default page size is 4 KB. Huge pages disabled.
> [    0.000000] printk: bootconsole [ttyB0] enabled
> [    0.000000] Initialized PDC Console for debugging.
> [    0.000000] Determining PDC firmware type: System Map.
> [    0.000000] model 00005bd0 00000491 00000000 00000002 782482ee 100000f0 00000008 000000b2 000000b2
> [    0.000000] vers  00000201
> [    0.000000] CPUID vers 17 rev 5 (0x00000225)
> [    0.000000] capabilities 0x3
> [    0.000000] model 9000/785/J5000
> [    0.000000] Memory Ranges:
> [    0.000000]  0) Start 0x0000000000000000 End 0x00000000efffffff Size   3840 MB
> [    0.000000]  1) Start 0x00000010f0000000 End 0x00000010ffffffff Size    256 MB
> [    0.000000] Total Memory: 4096 MB
> [    0.000000] PDT: type PDT_PDC, size 50, entries 0, status 2, dbe_loc 0xffffffffffffffff, good_mem 171 MB
> [    0.000000] PDT: Firmware reports all memory OK.
> [    0.000000] LCD display at fffffff0f05d0008,fffffff0f05d0000 registered
> [    0.000000] percpu: Embedded 25 pages/cpu @(____ptrval____) s64064 r8192 d30144 u102400
> [    0.000000] SMP: bootstrap CPU ID is 0
> [    0.000000] Built 1 zonelists, mobility grouping on.  Total pages: 1032192
> [    0.000000] Kernel command line: HOME=/ root=/dev/sda4 panic_timeout=60 panic=10 console=ttyS0,9600 kgdboc=ttyS0,9600 palo_kernel=0/vmlinuz
> [    0.000000] Dentry cache hash table entries: 524288 (order: 10, 4194304 bytes)
> [    0.000000] Inode-cache hash table entries: 262144 (order: 9, 2097152 bytes)
> [    0.000000] Memory: 4102676K/4194304K available (5660K kernel code, 1638K rwdata, 940K rodata, 444K init, 932K bss, 91628K reserved, 0K cma-reserved)
> [    0.000000] SLUB: HWalign=16, Order=0-3, MinObjects=0, CPUs=4, Nodes=1
> [    0.000000] rcu: Hierarchical RCU implementation.
> [    0.000000] rcu: 	RCU event tracing is enabled.
> [    0.000000] rcu: RCU calculated value of scheduler-enlistment delay is 25 jiffies.
> [    0.000000] NR_IRQS: 128
> [    0.000019] sched_clock: 64 bits at 440MHz, resolution 2ns, wraps every 4398046511103ns
> [    0.106184] Console: colour dummy device 160x64
> [    0.165835] Calibrating delay loop... 872.44 BogoMIPS (lpj=1744896)
> [    0.269845] pid_max: default: 32768 minimum: 301
> [    0.330589] Mount-cache hash table entries: 8192 (order: 4, 65536 bytes)
> [    0.422023] Mountpoint-cache hash table entries: 8192 (order: 4, 65536 bytes)
> [    0.514921] *** VALIDATE proc ***
> [    0.558204] *** VALIDATE cgroup1 ***
> [    0.605858] *** VALIDATE cgroup2 ***
> [    0.655933] rcu: Hierarchical SRCU implementation.
> [    0.878065] smp: Bringing up secondary CPUs ...
> [    0.937862] smp: Brought up 1 node, 1 CPU
> [    0.994306] devtmpfs: initialized
> [    1.040437] random: get_random_u32 called from bucket_table_alloc+0x270/0x2a0 with crng_init=0
> [    1.154583] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645041785100000 ns
> [    1.281884] futex hash table entries: 1024 (order: 3, 32768 bytes)
> [    1.366411] NET: Registered protocol family 16
> [    1.426212] Searching for devices...
> [    1.717848] Found devices:
> [    1.749864] 1. Astro BC Runway Port at 0xfffffffffed00000 [10] { 12, 0x0, 0x582, 0x0000b }
> [    1.861898] 2. Elroy PCI Bridge at 0xfffffffffed30000 [10/0] { 13, 0x0, 0x782, 0x0000a }
> [    1.965864] 3. Elroy PCI Bridge at 0xfffffffffed32000 [10/1] { 13, 0x0, 0x782, 0x0000a }
> [    2.073861] 4. Elroy PCI Bridge at 0xfffffffffed34000 [10/2] { 13, 0x0, 0x782, 0x0000a }
> [    2.177861] 5. Elroy PCI Bridge at 0xfffffffffed38000 [10/4] { 13, 0x0, 0x782, 0x0000a }
> [    2.285861] 6. Elroy PCI Bridge at 0xfffffffffed3c000 [10/6] { 13, 0x0, 0x782, 0x0000a }
> [    2.393860] 7. Forte W 2-way at 0xfffffffffffa0000 [32] { 0, 0x0, 0x5bd, 0x00004 }
> [    2.493860] 8. Forte W 2-way at 0xfffffffffffa2000 [34] { 0, 0x0, 0x5bd, 0x00004 }
> [    2.593860] 9. Memory at 0xfffffffffed10200 [49] { 1, 0x0, 0x088, 0x00009 }
> [    2.681855] Enabling regular chassis codes support v0.05
> [    2.874416] CPU1: thread -1, cpu 0, socket 1
> [    2.935257] Releasing cpu 1 now, hpa=fffffffffffa2000
> [hangs here forever]
>
> One interesting detail is that if i reserve PAGE0 from the memory mem, at least the HPMC
> handler from the kernel is triggered:
>
> [    2.785875] Backtrace:
> [    2.785875]  [<00000000401d20b4>] smp_boot_one_cpu+0x15c/0x1e8
> [    2.785875]  [<00000000401d2270>] __cpu_up+0xe0/0xf0
> [    2.785875]  [<00000000401f2580>] bringup_cpu+0xa0/0x1e0
> [    2.785875]  [<00000000401f1770>] cpuhp_invoke_callback+0x118/0x848
> [    2.785875]  [<00000000401f4848>] do_cpu_up+0x290/0x3d8
> [    2.785875]  [<00000000401f49f8>] cpu_up+0x68/0x80
> [    2.785875]  [<000000004010c47c>] processor_probe+0x3ec/0x420
> [    2.785875]  [<00000000401cae7c>] parisc_driver_probe+0x6c/0x98
> [    2.785875]  [<000000004083cb20>] really_probe+0x398/0x560
> [    2.785875]  [<000000004083d1e8>] driver_probe_device+0x198/0x1a0
> [    2.785875]  [<000000004083d3d0>] __driver_attach+0x1e0/0x1e8
> [    2.785875]  [<0000000040837ba0>] bus_for_each_dev+0x108/0x170
> [    2.785875]  [<000000004083bbb8>] driver_attach+0x80/0x98
> [    2.785875]  [<000000004083ab70>] bus_add_driver+0x298/0x4b8
> [    2.785875]  [<000000004083e628>] driver_register+0xe0/0x268
> [    2.785875]  [<00000000401cb0a0>] register_parisc_driver+0xa0/0x118
> [    2.785875]  [<000000004010cb44>] processor_init+0x6c/0x80
> [    2.785875]  [<0000000040108348>] parisc_init+0x348/0x5c0
> [    2.785875]  [<00000000401b30bc>] do_one_initcall+0xb4/0x2c8
> [    2.785875]  [<0000000040102ac0>] kernel_init_freeable+0x5a0/0x730
> [    2.785875]  [<0000000040bb0890>] kernel_init+0x60/0x318
> [    2.785875]  [<00000000401be020>] ret_from_kernel_thread+0x20/0x28
> [    2.785875]
> [    2.785875]
> [    2.785875] High Priority Machine Check (HPMC): Code=1 (High-priority machine check (HPMC)) at addr 0000000000000000
> [    2.785875] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.0.0-64bit+ #116
> [    2.785875] Hardware name: 9000/785/J5000
> [    2.785875]
> [    2.785875]      YZrvWESTHLNXBCVMcbcbcbcbOGFRQPDI
> [    2.785875] PSW: 00001000000001001111011100001111 Not tainted
> [    2.785875] r00-03  000000ff0804f70f 0000000040d27a80 0000000040bab790 00000000400ecfe0
> [    2.785875] r04-07  0000000040c58a80 0000000000000064 0000000000000002 0000000040ecee30
> [    2.785875] r08-11  00000000400ecfb0 0000000040c7ba80 0000000000000001 000000004106c858
> [    2.785875] r12-15  000000004106c860 0000000040f0c9e8 0000000000000064 0000000000000001
> [    2.785875] r16-19  0000000000000000 0000000040d29a80 0000000040c78280 000000005666f671
> [    2.785875] r20-23  0000000000000000 0000000000000000 00000000000001b8 000000000000abe0
> [    2.785875] r24-27  00000000400ecfe0 0000000000000000 0000000000000000 0000000040c58a80
> [    2.785875] r28-31  000000000000abe0 00000000400ed040 00000000400ed070 0000000056679e96
> [    2.785875] sr00-03  0000000000000000 0000000000000000 0000000000000000 0000000000000000
> [    2.785875] sr04-07  0000000000000000 0000000000000000 0000000000000000 0000000000000000
> [    2.785875]
> [    2.785875] IASQ: 0000000000000000 0000000000000000 IAOQ: 0000000040bab7c8 0000000040bab7cc
> [    2.785875]  IIR: 82953fb5    ISR: 0000000010340000  IOR: 000000003b4ed078
> [    2.785875]  CPU:        0   CR30: 00000000400ec000 CR31: 00000000ffffffff
> [    2.785875]  ORIG_R28: 0000000000000000
> [    2.785875]  IAOQ[0]: __udelay+0xe0/0x110
> [    2.785875]  IAOQ[1]: __udelay+0xe4/0x110
> [    2.785875]  RP(r2): __udelay+0xa8/0x110
> [    2.785875] Backtrace:
> [    2.785875]  [<00000000401d20b4>] smp_boot_one_cpu+0x15c/0x1e8
> [    2.785875]  [<00000000401d2270>] __cpu_up+0xe0/0xf0
> [    2.785875]  [<00000000401f2580>] bringup_cpu+0xa0/0x1e0
> [    2.785875]  [<00000000401f1770>] cpuhp_invoke_callback+0x118/0x848
> [    2.785875]  [<00000000401f4848>] do_cpu_up+0x290/0x3d8
> [    2.785875]  [<00000000401f49f8>] cpu_up+0x68/0x80
> [    2.785875]  [<000000004010c47c>] processor_probe+0x3ec/0x420
> [    2.785875]  [<00000000401cae7c>] parisc_driver_probe+0x6c/0x98
> [    2.785875]  [<000000004083cb20>] really_probe+0x398/0x560
> [    2.785875]  [<000000004083d1e8>] driver_probe_device+0x198/0x1a0
> [    2.785875]  [<000000004083d3d0>] __driver_attach+0x1e0/0x1e8
> [    2.785875]  [<0000000040837ba0>] bus_for_each_dev+0x108/0x170
> [    2.785875]  [<000000004083bbb8>] driver_attach+0x80/0x98
> [    2.785875]  [<000000004083ab70>] bus_add_driver+0x298/0x4b8
> [    2.785875]  [<000000004083e628>] driver_register+0xe0/0x268
> [    2.785875]  [<00000000401cb0a0>] register_parisc_driver+0xa0/0x118
> [    2.785875]  [<000000004010cb44>] processor_init+0x6c/0x80
> [    2.785875]  [<0000000040108348>] parisc_init+0x348/0x5c0
> [    2.785875]  [<00000000401b30bc>] do_one_initcall+0xb4/0x2c8
> [    2.785875]  [<0000000040102ac0>] kernel_init_freeable+0x5a0/0x730
> [    2.785875]  [<0000000040bb0890>] kernel_init+0x60/0x318
> [    2.785875]  [<00000000401be020>] ret_from_kernel_thread+0x20/0x28
> [    2.785875]
> [    2.785875] Kernel panic - not syncing: High Priority Machine Check (HPMC)
> [    2.785875] Rebooting in 10 seconds..
>
> PIM record shows:
>
> -----------------  Processor 1 HPMC Information ------------------
>
> Timestamp =
>   Thu Apr  11 13:35:21 GMT 2019    (20:19:04:11:13:35:21)
>
> HPMC Chassis Codes = 2cbf0  2510b  2cbf5  2cbfc
>
> General Registers 0 - 31
> 00-03   0000000000000000  0000000040000000  000000f0f0002090  0000000000000000
> 04-07   0000000000000e33  fffffff0f0400008  00000000000000fa  fffffff0f0002f68
> 08-11   fffffffffee003f8  00000000000000c4  000000000000000a  fffffff0f0001608
> 12-15   00000000000000f2  0000000000000001  0000000000000001  00000000000000f3
> 16-19   0000000002020202  0000000000000002  fffffff0f000016c  0440c24000000000
> 20-23   00000000000000cc  0000000000000001  0000000000000009  0000000000000000
> 24-27   000000000400c240  fffffffffffa2000  fffffff0f0000018  fffffff0f0412000
> 28-31   fffffffffffa2000  fffffff0f040ae70  00000010fb0e6f60  0000000000000000
>
> <Press any key to continue (q to quit)>
>
> Control Registers 0 - 31
> 00-03   0000000000000000  0000000000000000  0000000000000000  0000000000000000
> 04-07   0000000000000000  0000000000000000  0000000000000000  0000000000000000
> 08-11   0000000000000000  0000000000000000  0000000000000000  0000000000000000
> 12-15   0000000000000000  0000000000000000  0000000000000000  0000000000000000
> 16-19   0000001959e11c2f  0000000000000000  0000000000100274  000000000fd010de
> 20-23   00000000a637ffec  c0000000398e6f68  000000ff00007f08  0000000000000000
> 24-27   ffffffffffffffff  ffffffffffffffff  ffffffffffffffff  ffffffffffffffff
> 28-31   ffffffffffffffff  ffffffffffffffff  ffffffffffffffff  ffffffffffffffff
> Space Registers 0 - 7
>
> 00-03   00000000          00000000          00000000          00000000
> 04-07   00000000          00000000          00000000          00000000
>
> <Press any key to continue (q to quit)>
>
> IIA Space                    = 0x0000000000000000
> IIA Offset                   = 0x0000000000100278
> Check Type                   = 0x20000000
> CPU State                    = 0x9e000004
> Cache Check                  = 0x00000000
> TLB Check                    = 0x00000000
> Bus Check                    = 0x0030103b
> Assists Check                = 0x00000000
> Assist State                 = 0x00000000
> Path Info                    = 0x00000000
> System Responder Address     = 0x000000fffb0e6f68
> System Requestor Address     = 0xfffffffffffa2000
>
> 0000000040100250 <smp_slave_stext>:
>     40100250:   00 00 38 20     mtsp r0,sr4
>     40100254:   00 00 78 20     mtsp r0,sr5
>     40100258:   00 00 b8 20     mtsp r0,sr6
>     4010025c:   00 00 f8 20     mtsp r0,sr7
>     40100260:   23 d6 50 20     ldil L%106c800,sp
>     40100264:   37 de 00 b0     ldo 58(sp),sp
>     40100268:   0f c0 10 de     ldd 0(sp),sp
>     4010026c:   20 20 08 00     ldil L%40000000,r1
>     40100270:   08 3e 04 1e     sub sp,r1,sp
>
>     40100274:   0f d0 10 de     ldd 8(sp),sp        <-- HPMC
>
>     40100278:   03 de 18 40     mtctl sp,tr6
>     4010027c:   37 de 01 80     ldo c0(sp),sp
>     40100280:   20 94 20 20     ldil L%1029000,r4
>     40100284:   34 84 00 00     ldo 0(r4),r4
>     40100288:   03 04 18 40     mtctl r4,tr0
>     4010028c:   03 24 18 40     mtctl r4,tr1
>     40100290:   08 1a 02 43     copy r26,r3
>     40100294:   21 66 18 02     ldil L%4010c800,r11
>     40100298:   35 6b 0e c0     ldo 760(r11),r11
>     4010029c:   e8 1f 1c b5     b,l 401000fc <common_stext>,r0
>     401002a0:   08 00 02 40     nop
>
> sp (r30) is 00000010fb0e6f60, which is valid RAM. However, it's triggering a HPMC
> and the Display show Bus timeout. Does anyone have an idea what's going wrong?
>
> Regards,
> Sven
>


-- 
John David Anglin  dave.anglin@xxxxxxxx




[Index of Archives]     [Linux SoC]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux