Kernel oops

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I have retried compiling the 3.4 Kernel this time in Squeeze. The
Kernel compiles fine and will boot up to 255 cores; however, after
that if fails out with the following Kernel oops when booting more
than 255 cores (Kernel is compiled with 512). Here is the boot log
with the option bootmem_debug=1. I have tried to shorten the boot log
and leave what I think are the important parts; however if anyone
needs the 35M boot log then I will gladly send as attachment.

[    0.000000] Initializing cgroup subsys cpuset
[    0.000000] Initializing cgroup subsys cpu
[    0.000000] Linux version 3.4.49 (beeij@debian-hpc) (gcc version
4.4.5 (Debian 4.4.5-8) ) #16 SMP Thu Aug 29 14:41:59 CDT 2013
[    0.000000] EFI v1.10 by INTEL: SALsystab=0x1802c2d990 ACPI 2.0=0x1802c2da80
[    0.000000] booting generic kernel on platform sn2
[    0.000000] console [sn_sal0] enabled
[    0.000000] ACPI: RSDP 0000001802c2da80 00024 (v02    SGI)
[    0.000000] ACPI: XSDT 0000001802c38df0 00044 (v01    SGI  XSDTSN2
00010001    ? 00000094)
[    0.000000] ACPI: APIC 0000001802c2f5a0 0152C (v01    SGI  APICSN2
00010001    ? 00000001)
[    0.000000] ACPI: SRAT 0000001802c30ae0 02DB0 (v01    SGI  SRATSN2
00010001    ? 00000001)
[    0.000000] ACPI: SLIT 0000001802c338a0 0312C (v01    SGI  SLITSN2
00010001    ? 00000001)
[    0.000000] ACPI: FACP 0000001802c369e0 000F4 (v03    SGI  FACPSN2
00030001    ? 00000001)
[    0.000000] ACPI Warning: 32/64X length mismatch in Pm1aEventBlock:
32/0 (20120320/tbfadt-548)
[    0.000000] ACPI Warning: 32/64X length mismatch in
Pm1aControlBlock: 16/0 (20120320/tbfadt-548)
[    0.000000] ACPI Warning: 32/64X length mismatch in PmTimerBlock:
32/0 (20120320/tbfadt-548)
[    0.000000] ACPI Warning: 32/64X length mismatch in Gpe0Block: 64/0
(20120320/tbfadt-548)
[    0.000000] ACPI Warning: Invalid length for Pm1aEventBlock: 0,
using default 32 (20120320/tbfadt-629)
[    0.000000] ACPI Warning: Invalid length for Pm1aControlBlock: 0,
using default 16 (20120320/tbfadt-629)
[    0.000000] ACPI Warning: Invalid length for PmTimerBlock: 0, using
default 32 (20120320/tbfadt-629)
[    0.000000] ACPI: DSDT 0000001802c3af20 00024 (v02    SGI  DSDTSN2
00020001    ? 00002483)
[    0.000000] ACPI: FACS 0000001802c2e1e0 00040
[    0.000000] ACPI: Local APIC address c0000000fee00000
[    0.000000] 448 CPUs available, 448 CPUs total
[    0.000000] Number of logical nodes in system = 112
[    0.000000] Number of memory chunks in system = 112
[    0.000000] SMP: Allowing 448 CPUs, 0 hotplug CPUs

[=================SNIP=======================]
[    0.000000] On node 63 totalpages: 504832
[    0.000000] free_area_init_node: node 63, pgdat e0000fd8040c1f80,
node_mem_map a0007ff57d62a000
[    0.000000]   DMA zone: 2650 pages used for memmap
[    0.000000]   DMA zone: 0 pages reserved
[    0.000000]   DMA zone: 502182 pages, LIFO batch:7
[    0.000000] bootmem::alloc_bootmem_core nid=63 size=18 [1 pages]
align=80 goal=4000000000000 limit=0
[    0.000000] bootmem::__reserve nid=63 start=3f601318 end=3f601319 flags=1
[    0.000000] bootmem::alloc_bootmem_core nid=63 size=18000 [6 pages]
align=80 goal=4000000000000 limit=0
[    0.000000] bootmem::__reserve nid=63 start=3f601319 end=3f60131f flags=1
[    0.000000] Could not find start_pfn for node 64
[    0.000000] On node 64 totalpages: 0
[    0.000000] free_area_init_node: node 64, pgdat e000101804102000,
node_mem_map a0007ff5b562a000
[    0.000000] Could not find start_pfn for node 65
[    0.000000] On node 65 totalpages: 0
[    0.000000] free_area_init_node: node 65, pgdat e000105804142080,
node_mem_map a0007ff5ed62a000
[    0.000000] Could not find start_pfn for node 66
[    0.000000] On node 66 totalpages: 0
[=================SNIP=======================]

 [    0.000000] BUG: Bad page state in process swapper  pfn:40601318
[    0.000000] page:a0007ff5b5642d40 count:0 mapcount:1 mapping:
   (null) index:0x0
[    0.000000] page flags: 0x0()
[    0.000000] Modules linked in:
[    0.000000] Unable to handle kernel NULL pointer dereference
(address 0000000000000018)
[    0.000000] swapper[0]: Oops 11003706212352 [1]
[    0.000000] Modules linked in:
[    0.000000]
[    0.000000] Pid: 0, CPU 0, comm:              swapper
[    0.000000] psr : 00001210084a2018 ifs : 800000000000cc18 ip  :
[<a0000001003ea1b1>]    Not tainted (3.4.49)
[    0.000000] ip is at __copy_user+0x891/0x950
[    0.000000] unat: 0000000000000000 pfs : 0000000000000792 rsc :
0000000000000003
[    0.000000] rnat: 0000000000000000 bsps: 0000000000000000 pr  :
0bad0bad0baa55a9
[    0.000000] ldrs: 0000000000000000 ccv : 0000000000000000 fpsr:
0009804c8a70433f
[    0.000000] csd : 0000000000000000 ssd : 0000000000000000
[    0.000000] b0  : a000000100043430 b6  : a000000100043660 b7  :
a00000010000c3b0
[    0.000000] f6  : 000000000000000000000 f7  : 1003e9e3779b97f4a7c16
[    0.000000] f8  : 1003e0a00000010001577 f9  : 10006c7fffffffd73ea5c
[    0.000000] f10 : 1003e0000000000000000 f11 : 1003e0044b82fa09b5a53
[    0.000000] r1  : a000000100dfa9e0 r2  : a000000100ac75f0 r3  :
a000000100ac75f8
[    0.000000] r8  : 0000000000000298 r9  : 0000000000000013 r10 :
0000000000000000
[    0.000000] r11 : 0bad0bad0baa11e9 r12 : a000000100ac7550 r13 :
a000000100ac0000
[    0.000000] r14 : a000000100e44080 r15 : a000000100e44030 r16 :
0000000000000298
[    0.000000] r17 : 0000000000000010 r18 : 0000000000000018 r19 :
a000000100ac7850
[    0.000000] r20 : 0000000000000290 r21 : a000000100ac75b4 r22 :
a000000100c12f20
[    0.000000] r23 : a000000100ac75b0 r24 : 0000000000000000 r25 :
a000000100e44030
[    0.000000] r26 : a0000001007ea718 r27 : 0000000000018869 r28 :
a000000100ac4000
[    0.000000] r29 : 0000000000000014 r30 : 0000000000000000 r31 :
0000000000000792

It looks like after Node 64 which would be cores 256 and up, it can
not find start_pfn and
it shows 0 total pages. The instruction pointer is at
[__copy_user+0x891/0x950].  In the meantime I
have compiled the 2.6.35 Kernel with support for 1024 CPUs that works
as a hold over. Anyone have any ideas as to why it is failing out at
this point?

Thanks,

Beeij
--
To unsubscribe from this list: send the line "unsubscribe linux-ia64" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Kernel]     [Sparc Linux]     [DCCP]     [Linux ARM]     [Yosemite News]     [Linux SCSI]     [Linux x86_64]     [Linux for Ham Radio]

  Powered by Linux