Hi all, The last few days linux-next trees have failed to boot on one of our PowerPC Power6 servers. The partial boot log looks like this (full log below): memory layout at init: alloc_bottom : 0000000004dd4000 alloc_top : 0000000008000000 alloc_top_hi : 0000000008000000 rmo_top : 0000000008000000 ram_top : 0000000008000000 physicalMemorySize = 0x80000000 Zone PFN ranges: DMA 0x00000000 -> 0x00080000 Normal 0x00080000 -> 0x00080000 Movable zone start PFN for each node early_node_map[1] active PFN ranges 1: 0x00000000 -> 0x00080000 Could not find start_pfn for node 0 [boot]0015 Setup Done Built 2 zonelists in Node order, mobility grouping on. Total pages: 510976 Policy zone: DMA Kernel command line: root=/dev/VolGroup00/LogVol00 ro console=hvc0 autobench_args: root=/dev/sdb1 ABAT:1240566469 initcall_debug Mount-cache hash table entries: 256 Unable to handle kernel paging request for data at address 0x00000008 Faulting instruction address: 0xc000000000100524 Oops: Kernel access of bad area, sig: 11 [#1] SMP NR_CPUS=128 NUMA pSeries Modules linked in: NIP: c000000000100524 LR: c0000000001004f4 CTR: 0000000000000001 REGS: c000000000aef690 TRAP: 0300 Not tainted (2.6.30-rc1-autokern1) MSR: 8000000000009032 <EE,ME,IR,DR> CR: 48000082 XER: 00000005 DAR: 0000000000000008, DSISR: 0000000042000000 TASK = c0000000009bedd0[0] 'swapper' THREAD: c000000000aec000 CPU: 0 GPR00: c00000007e001030 c000000000aef910 c000000000aeae68 c000000000b32680 GPR04: c00000007e001000 c0000000009bf718 0000000000000002 c0000000009bf718 GPR08: 000000000000001a 0000000000000001 0000000000000000 0000000000000001 GPR12: 0000000088000084 c000000000b3b280 0000000000000000 0000000003500000 GPR16: c0000000006b4088 c0000000006b27e8 0000000000000000 00000000003d3800 GPR20: 0000000003cae870 c0000000007ae870 0000000000000010 0000000000000000 GPR24: c000000000b4d6f0 f000000003347488 c000000000b32680 f000000003347488 GPR28: c00000007e001180 c00000007e001000 c000000000a56788 f0000000033474a8 NIP [c000000000100524] .__slab_alloc_page+0x380/0x3dc LR [c0000000001004f4] .__slab_alloc_page+0x350/0x3dc Call Trace: [c000000000aef910] [c0000000001004f4] .__slab_alloc_page+0x350/0x3dc (unreliable) [c000000000aef9d0] [c000000000100f14] .__remote_slab_alloc+0x60/0x138 [c000000000aefa80] [c00000000010284c] .__kmalloc_track_caller+0xb4/0x23c [c000000000aefb30] [c0000000000df258] .kstrdup+0x4c/0x13c [c000000000aefbd0] [c000000000128504] .alloc_vfsmnt+0xb0/0x178 [c000000000aefc70] [c00000000010dfdc] .vfs_kern_mount+0x40/0xf8 [c000000000aefd10] [c0000000007954a4] .sysfs_init+0x90/0x108 [c000000000aefdb0] [c00000000079409c] .mnt_init+0xbc/0x254 [c000000000aefe50] [c000000000793afc] .vfs_caches_init+0x150/0x184 [c000000000aefee0] [c000000000777a08] .start_kernel+0x420/0x48c [c000000000aeff90] [c000000000008368] .start_here_common+0x1c/0x34 Instruction dump: 60000000 e93d0040 e97d0028 381d0030 7fa4eb78 e95d0030 7f43d378 39290001 396b0001 f93d0040 f97d0028 f95b0020 <fbea0008> fbfd0030 f81f0008 4bfffb59 ---[ end trace 31fd0ba7d8756001 ]--- Kernel panic - not syncing: Attempted to kill the idle task! Call Trace: [c000000000aef2f0] [c00000000000fcb8] .show_stack+0x70/0x184 (unreliable) [c000000000aef3a0] [c00000000005ace8] .panic+0x80/0x1b4 [c000000000aef440] [c00000000005f330] .do_exit+0x84/0x6f0 [c000000000aef500] [c0000000000242ac] .die+0x24c/0x27c [c000000000aef5a0] [c00000000002c154] .bad_page_fault+0xb8/0xd4 [c000000000aef620] [c00000000000534c] handle_page_fault+0x3c/0x5c --- Exception: 300 at .__slab_alloc_page+0x380/0x3dc LR = .__slab_alloc_page+0x350/0x3dc [c000000000aef9d0] [c000000000100f14] .__remote_slab_alloc+0x60/0x138 [c000000000aefa80] [c00000000010284c] .__kmalloc_track_caller+0xb4/0x23c [c000000000aefb30] [c0000000000df258] .kstrdup+0x4c/0x13c [c000000000aefbd0] [c000000000128504] .alloc_vfsmnt+0xb0/0x178 [c000000000aefc70] [c00000000010dfdc] .vfs_kern_mount+0x40/0xf8 [c000000000aefd10] [c0000000007954a4] .sysfs_init+0x90/0x108 [c000000000aefdb0] [c00000000079409c] .mnt_init+0xbc/0x254 [c000000000aefe50] [c000000000793afc] .vfs_caches_init+0x150/0x184 [c000000000aefee0] [c000000000777a08] .start_kernel+0x420/0x48c [c000000000aeff90] [c000000000008368] .start_here_common+0x1c/0x34 Rebooting in 180 seconds.. I bisected it down to this: commit d8895be47d51305a48dec818803679b9a640cef5 Author: Nick Piggin <npiggin@xxxxxxx> Date: Tue Apr 14 18:50:58 2009 +0200 mm: prompt slqb default for oldconfig Change Kconfig names for slab allocator choices to prod SLQB into being the default. Hopefully increasing testing base. Signed-off-by: Nick Piggin <npiggin@xxxxxxx> Signed-off-by: Pekka Enberg <penberg@xxxxxxxxxxxxxx> But that is just where CONFIG_SLQB gets turned on, so I further bisected the slab tree down to this: commit 13079b7881aa3b027e4babdff901765e88d7e216 Author: Nick Piggin <npiggin@xxxxxxx> Date: Tue Feb 3 15:07:12 2009 +0100 slqb: dynamic array allocations Implement dynamic allocation for SLQB per-cpu and per-node arrays. This should hopefully have minimal runtime performance impact, because although there is an extra level of indirection to do allocations, the pointer should be in the cache hot area of the struct kmem_cache. It's not quite possible to use dynamic percpu allocator for this: firstly, that subsystem uses the slab allocator. Secondly, it doesn't have good support for per-node data. If those problems were improved, we could use it. For now, just implement a very very simple allocator until the kmalloc caches are up. On x86-64 with a NUMA MAXCPUS config, sizes look like this: text data bss dec hex filename 29960 259565 100 289625 46b59 mm/slab.o 34130 497130 696 531956 81df4 mm/slub.o 24575 1634267 111136 1769978 1b01fa mm/slqb.o 24845 13959 712 39516 9a5c mm/slqb.o + this patch SLQB is now 2 orders of magnitude smaller than it was, and an order of magnitude smaller than SLAB or SLUB (in total size -- text size has always been smaller). So it should now be very suitable for distro-type configs in this respect. As a side-effect the UP version of cpu_slab (which is embedded directly in the kmem_cache struct) moves up to the hot cachelines, so it need no longer be cacheline aligned on UP. The overall result should be a reduction in cacheline footprint on UP kernels. Signed-off-by: Nick Piggin <npiggin@xxxxxxx> Signed-off-by: Pekka Enberg <penberg@xxxxxxxxxxxxxx> -- Cheers, Stephen Rothwell sfr@xxxxxxxxxxxxxxxx http://www.canb.auug.org.au/~sfr/ Please wait, loading kernel... Elf64 kernel loaded... Loading ramdisk... ramdisk loaded at 04a00000, size: 3918 Kbytes OF stdout device is: /vdevice/vty@30000000 Preparing to boot Linux version 2.6.30-rc1-autokern1 (root@xxxxxxxxxxxxxxxxxxxxxxxxxx) (gcc version 4.1.3 20070929 (prerelease) (Ubuntu 4.1.2-16ubuntu2)) #1 SMP Fri Apr 24 09:45:10 UTC 2009 Calling ibm,client-architecture... done command line: root=/dev/VolGroup00/LogVol00 ro console=hvc0 autobench_args: root=/dev/sdb1 ABAT:1240566469 initcall_debug memory layout at init: alloc_bottom : 0000000004dd4000 alloc_top : 0000000008000000 alloc_top_hi : 0000000008000000 rmo_top : 0000000008000000 ram_top : 0000000008000000 instantiating rtas at 0x00000000074e6000... done boot cpu hw idx 0000000000000000 starting cpu hw idx 0000000000000002... done copying OF device tree... Building dt strings... Building dt structure... Device tree strings 0x0000000004ed5000 -> 0x0000000004ed661a Device tree struct 0x0000000004ed7000 -> 0x0000000004ef0000 Calling quiesce... returning from prom_init Phyp-dump disabled at boot time Using pSeries machine description Using 1TB segments Found initrd at 0xc000000004a00000:0xc000000004dd3800 console [udbg0] enabled Partition configured for 4 cpus. CPU maps initialized for 2 threads per core Starting Linux PPC64 #1 SMP Fri Apr 24 09:45:10 UTC 2009 ----------------------------------------------------- ppc64_pft_size = 0x19 physicalMemorySize = 0x80000000 htab_hash_mask = 0x3ffff ----------------------------------------------------- Initializing cgroup subsys cpuset Initializing cgroup subsys cpu Linux version 2.6.30-rc1-autokern1 (root@xxxxxxxxxxxxxxxxxxxxxxxxxx) (gcc version 4.1.3 20070929 (prerelease) (Ubuntu 4.1.2-16ubuntu2)) #1 SMP Fri Apr 24 09:45:10 UTC 2009 [boot]0012 Setup Arch EEH: No capable adapters found PPC64 nvram contains 15360 bytes Zone PFN ranges: DMA 0x00000000 -> 0x00080000 Normal 0x00080000 -> 0x00080000 Movable zone start PFN for each node early_node_map[1] active PFN ranges 1: 0x00000000 -> 0x00080000 Could not find start_pfn for node 0 [boot]0015 Setup Done Built 2 zonelists in Node order, mobility grouping on. Total pages: 510976 Policy zone: DMA Kernel command line: root=/dev/VolGroup00/LogVol00 ro console=hvc0 autobench_args: root=/dev/sdb1 ABAT:1240566469 initcall_debug NR_IRQS:512 [boot]0020 XICS Init [boot]0021 XICS Done PID hash table entries: 4096 (order: 12, 32768 bytes) clocksource: timebase mult[7d0000] shift[22] registered Console: colour dummy device 80x25 console handover: boot [udbg0] -> real [hvc0] Lock dependency validator: Copyright (c) 2006 Red Hat, Inc., Ingo Molnar ... MAX_LOCKDEP_SUBCLASSES: 8 ... MAX_LOCK_DEPTH: 48 ... MAX_LOCKDEP_KEYS: 8191 ... CLASSHASH_SIZE: 4096 ... MAX_LOCKDEP_ENTRIES: 8192 ... MAX_LOCKDEP_CHAINS: 16384 ... CHAINHASH_SIZE: 8192 memory used by lock dependency info: 5119 kB per task-struct memory footprint: 2688 bytes allocated 20971520 bytes of page_cgroup please try cgroup_disable=memory option if you don't want freeing bootmem node 1 Memory: 1967824k/2097152k available (9752k kernel code, 129328k reserved, 1420k data, 8422k bss, 2108k init) Calibrating delay loop... 1021.95 BogoMIPS (lpj=2043904) Dentry cache hash table entries: 262144 (order: 9, 2097152 bytes) Inode-cache hash table entries: 131072 (order: 8, 1048576 bytes) Mount-cache hash table entries: 256 Unable to handle kernel paging request for data at address 0x00000008 Faulting instruction address: 0xc000000000100524 Oops: Kernel access of bad area, sig: 11 [#1] SMP NR_CPUS=128 NUMA pSeries Modules linked in: NIP: c000000000100524 LR: c0000000001004f4 CTR: 0000000000000001 REGS: c000000000aef690 TRAP: 0300 Not tainted (2.6.30-rc1-autokern1) MSR: 8000000000009032 <EE,ME,IR,DR> CR: 48000082 XER: 00000005 DAR: 0000000000000008, DSISR: 0000000042000000 TASK = c0000000009bedd0[0] 'swapper' THREAD: c000000000aec000 CPU: 0 GPR00: c00000007e001030 c000000000aef910 c000000000aeae68 c000000000b32680 GPR04: c00000007e001000 c0000000009bf718 0000000000000002 c0000000009bf718 GPR08: 000000000000001a 0000000000000001 0000000000000000 0000000000000001 GPR12: 0000000088000084 c000000000b3b280 0000000000000000 0000000003500000 GPR16: c0000000006b4088 c0000000006b27e8 0000000000000000 00000000003d3800 GPR20: 0000000003cae870 c0000000007ae870 0000000000000010 0000000000000000 GPR24: c000000000b4d6f0 f000000003347488 c000000000b32680 f000000003347488 GPR28: c00000007e001180 c00000007e001000 c000000000a56788 f0000000033474a8 NIP [c000000000100524] .__slab_alloc_page+0x380/0x3dc LR [c0000000001004f4] .__slab_alloc_page+0x350/0x3dc Call Trace: [c000000000aef910] [c0000000001004f4] .__slab_alloc_page+0x350/0x3dc (unreliable) [c000000000aef9d0] [c000000000100f14] .__remote_slab_alloc+0x60/0x138 [c000000000aefa80] [c00000000010284c] .__kmalloc_track_caller+0xb4/0x23c [c000000000aefb30] [c0000000000df258] .kstrdup+0x4c/0x13c [c000000000aefbd0] [c000000000128504] .alloc_vfsmnt+0xb0/0x178 [c000000000aefc70] [c00000000010dfdc] .vfs_kern_mount+0x40/0xf8 [c000000000aefd10] [c0000000007954a4] .sysfs_init+0x90/0x108 [c000000000aefdb0] [c00000000079409c] .mnt_init+0xbc/0x254 [c000000000aefe50] [c000000000793afc] .vfs_caches_init+0x150/0x184 [c000000000aefee0] [c000000000777a08] .start_kernel+0x420/0x48c [c000000000aeff90] [c000000000008368] .start_here_common+0x1c/0x34 Instruction dump: 60000000 e93d0040 e97d0028 381d0030 7fa4eb78 e95d0030 7f43d378 39290001 396b0001 f93d0040 f97d0028 f95b0020 <fbea0008> fbfd0030 f81f0008 4bfffb59 ---[ end trace 31fd0ba7d8756001 ]--- Kernel panic - not syncing: Attempted to kill the idle task! Call Trace: [c000000000aef2f0] [c00000000000fcb8] .show_stack+0x70/0x184 (unreliable) [c000000000aef3a0] [c00000000005ace8] .panic+0x80/0x1b4 [c000000000aef440] [c00000000005f330] .do_exit+0x84/0x6f0 [c000000000aef500] [c0000000000242ac] .die+0x24c/0x27c [c000000000aef5a0] [c00000000002c154] .bad_page_fault+0xb8/0xd4 [c000000000aef620] [c00000000000534c] handle_page_fault+0x3c/0x5c --- Exception: 300 at .__slab_alloc_page+0x380/0x3dc LR = .__slab_alloc_page+0x350/0x3dc [c000000000aef9d0] [c000000000100f14] .__remote_slab_alloc+0x60/0x138 [c000000000aefa80] [c00000000010284c] .__kmalloc_track_caller+0xb4/0x23c [c000000000aefb30] [c0000000000df258] .kstrdup+0x4c/0x13c [c000000000aefbd0] [c000000000128504] .alloc_vfsmnt+0xb0/0x178 [c000000000aefc70] [c00000000010dfdc] .vfs_kern_mount+0x40/0xf8 [c000000000aefd10] [c0000000007954a4] .sysfs_init+0x90/0x108 [c000000000aefdb0] [c00000000079409c] .mnt_init+0xbc/0x254 [c000000000aefe50] [c000000000793afc] .vfs_caches_init+0x150/0x184 [c000000000aefee0] [c000000000777a08] .start_kernel+0x420/0x48c [c000000000aeff90] [c000000000008368] .start_here_common+0x1c/0x34 Rebooting in 180 seconds..
Attachment:
pgp036Bn43E8Z.pgp
Description: PGP signature