On Sat, Apr 25, 2009 at 12:52:06AM +1000, Stephen Rothwell wrote: > Hi all, > > The last few days linux-next trees have failed to boot on one of our > PowerPC Power6 servers. The partial boot log looks like this (full > log below): Hi Stephen, That would be SLQB. Must be a corner case I have not covered properly with your NUMA config. Thanks for reporting, I'll take a look. > > memory layout at init: > alloc_bottom : 0000000004dd4000 > alloc_top : 0000000008000000 > alloc_top_hi : 0000000008000000 > rmo_top : 0000000008000000 > ram_top : 0000000008000000 > > physicalMemorySize = 0x80000000 > > Zone PFN ranges: > DMA 0x00000000 -> 0x00080000 > Normal 0x00080000 -> 0x00080000 > Movable zone start PFN for each node > early_node_map[1] active PFN ranges > 1: 0x00000000 -> 0x00080000 > Could not find start_pfn for node 0 > [boot]0015 Setup Done > Built 2 zonelists in Node order, mobility grouping on. Total pages: 510976 > Policy zone: DMA > Kernel command line: root=/dev/VolGroup00/LogVol00 ro console=hvc0 autobench_args: root=/dev/sdb1 ABAT:1240566469 initcall_debug > > Mount-cache hash table entries: 256 > Unable to handle kernel paging request for data at address 0x00000008 > Faulting instruction address: 0xc000000000100524 > Oops: Kernel access of bad area, sig: 11 [#1] > SMP NR_CPUS=128 NUMA pSeries > Modules linked in: > NIP: c000000000100524 LR: c0000000001004f4 CTR: 0000000000000001 > REGS: c000000000aef690 TRAP: 0300 Not tainted (2.6.30-rc1-autokern1) > MSR: 8000000000009032 <EE,ME,IR,DR> CR: 48000082 XER: 00000005 > DAR: 0000000000000008, DSISR: 0000000042000000 > TASK = c0000000009bedd0[0] 'swapper' THREAD: c000000000aec000 CPU: 0 > GPR00: c00000007e001030 c000000000aef910 c000000000aeae68 c000000000b32680 > GPR04: c00000007e001000 c0000000009bf718 0000000000000002 c0000000009bf718 > GPR08: 000000000000001a 0000000000000001 0000000000000000 0000000000000001 > GPR12: 0000000088000084 c000000000b3b280 0000000000000000 0000000003500000 > GPR16: c0000000006b4088 c0000000006b27e8 0000000000000000 00000000003d3800 > GPR20: 0000000003cae870 c0000000007ae870 0000000000000010 0000000000000000 > GPR24: c000000000b4d6f0 f000000003347488 c000000000b32680 f000000003347488 > GPR28: c00000007e001180 c00000007e001000 c000000000a56788 f0000000033474a8 > NIP [c000000000100524] .__slab_alloc_page+0x380/0x3dc > LR [c0000000001004f4] .__slab_alloc_page+0x350/0x3dc > Call Trace: > [c000000000aef910] [c0000000001004f4] .__slab_alloc_page+0x350/0x3dc (unreliable) > [c000000000aef9d0] [c000000000100f14] .__remote_slab_alloc+0x60/0x138 > [c000000000aefa80] [c00000000010284c] .__kmalloc_track_caller+0xb4/0x23c > [c000000000aefb30] [c0000000000df258] .kstrdup+0x4c/0x13c > [c000000000aefbd0] [c000000000128504] .alloc_vfsmnt+0xb0/0x178 > [c000000000aefc70] [c00000000010dfdc] .vfs_kern_mount+0x40/0xf8 > [c000000000aefd10] [c0000000007954a4] .sysfs_init+0x90/0x108 > [c000000000aefdb0] [c00000000079409c] .mnt_init+0xbc/0x254 > [c000000000aefe50] [c000000000793afc] .vfs_caches_init+0x150/0x184 > [c000000000aefee0] [c000000000777a08] .start_kernel+0x420/0x48c > [c000000000aeff90] [c000000000008368] .start_here_common+0x1c/0x34 > Instruction dump: > 60000000 e93d0040 e97d0028 381d0030 7fa4eb78 e95d0030 7f43d378 39290001 > 396b0001 f93d0040 f97d0028 f95b0020 <fbea0008> fbfd0030 f81f0008 4bfffb59 > ---[ end trace 31fd0ba7d8756001 ]--- > Kernel panic - not syncing: Attempted to kill the idle task! > Call Trace: > [c000000000aef2f0] [c00000000000fcb8] .show_stack+0x70/0x184 (unreliable) > [c000000000aef3a0] [c00000000005ace8] .panic+0x80/0x1b4 > [c000000000aef440] [c00000000005f330] .do_exit+0x84/0x6f0 > [c000000000aef500] [c0000000000242ac] .die+0x24c/0x27c > [c000000000aef5a0] [c00000000002c154] .bad_page_fault+0xb8/0xd4 > [c000000000aef620] [c00000000000534c] handle_page_fault+0x3c/0x5c > --- Exception: 300 at .__slab_alloc_page+0x380/0x3dc > LR = .__slab_alloc_page+0x350/0x3dc > [c000000000aef9d0] [c000000000100f14] .__remote_slab_alloc+0x60/0x138 > [c000000000aefa80] [c00000000010284c] .__kmalloc_track_caller+0xb4/0x23c > [c000000000aefb30] [c0000000000df258] .kstrdup+0x4c/0x13c > [c000000000aefbd0] [c000000000128504] .alloc_vfsmnt+0xb0/0x178 > [c000000000aefc70] [c00000000010dfdc] .vfs_kern_mount+0x40/0xf8 > [c000000000aefd10] [c0000000007954a4] .sysfs_init+0x90/0x108 > [c000000000aefdb0] [c00000000079409c] .mnt_init+0xbc/0x254 > [c000000000aefe50] [c000000000793afc] .vfs_caches_init+0x150/0x184 > [c000000000aefee0] [c000000000777a08] .start_kernel+0x420/0x48c > [c000000000aeff90] [c000000000008368] .start_here_common+0x1c/0x34 > Rebooting in 180 seconds.. > > I bisected it down to this: > > commit d8895be47d51305a48dec818803679b9a640cef5 > Author: Nick Piggin <npiggin@xxxxxxx> > Date: Tue Apr 14 18:50:58 2009 +0200 > > mm: prompt slqb default for oldconfig > > Change Kconfig names for slab allocator choices to prod SLQB into being > the default. Hopefully increasing testing base. > > Signed-off-by: Nick Piggin <npiggin@xxxxxxx> > Signed-off-by: Pekka Enberg <penberg@xxxxxxxxxxxxxx> > > But that is just where CONFIG_SLQB gets turned on, so I further bisected > the slab tree down to this: > > commit 13079b7881aa3b027e4babdff901765e88d7e216 > Author: Nick Piggin <npiggin@xxxxxxx> > Date: Tue Feb 3 15:07:12 2009 +0100 > > slqb: dynamic array allocations > > Implement dynamic allocation for SLQB per-cpu and per-node arrays. This > should hopefully have minimal runtime performance impact, because although > there is an extra level of indirection to do allocations, the pointer should > be in the cache hot area of the struct kmem_cache. > > It's not quite possible to use dynamic percpu allocator for this: firstly, > that subsystem uses the slab allocator. Secondly, it doesn't have good > support for per-node data. If those problems were improved, we could use it. > For now, just implement a very very simple allocator until the kmalloc > caches are up. > > On x86-64 with a NUMA MAXCPUS config, sizes look like this: > text data bss dec hex filename > 29960 259565 100 289625 46b59 mm/slab.o > 34130 497130 696 531956 81df4 mm/slub.o > 24575 1634267 111136 1769978 1b01fa mm/slqb.o > 24845 13959 712 39516 9a5c mm/slqb.o + this patch > > SLQB is now 2 orders of magnitude smaller than it was, and an order of > magnitude smaller than SLAB or SLUB (in total size -- text size has > always been smaller). So it should now be very suitable for distro-type > configs in this respect. > > As a side-effect the UP version of cpu_slab (which is embedded directly > in the kmem_cache struct) moves up to the hot cachelines, so it need no > longer be cacheline aligned on UP. The overall result should be a > reduction in cacheline footprint on UP kernels. > > Signed-off-by: Nick Piggin <npiggin@xxxxxxx> > Signed-off-by: Pekka Enberg <penberg@xxxxxxxxxxxxxx> > > -- > Cheers, > Stephen Rothwell sfr@xxxxxxxxxxxxxxxx > http://www.canb.auug.org.au/~sfr/ > > Please wait, loading kernel... > Elf64 kernel loaded... > Loading ramdisk... > ramdisk loaded at 04a00000, size: 3918 Kbytes > OF stdout device is: /vdevice/vty@30000000 > Preparing to boot Linux version 2.6.30-rc1-autokern1 (root@xxxxxxxxxxxxxxxxxxxxxxxxxx) (gcc version 4.1.3 20070929 (prerelease) (Ubuntu 4.1.2-16ubuntu2)) #1 SMP Fri Apr 24 09:45:10 UTC 2009 > Calling ibm,client-architecture... done > command line: root=/dev/VolGroup00/LogVol00 ro console=hvc0 autobench_args: root=/dev/sdb1 ABAT:1240566469 initcall_debug > memory layout at init: > alloc_bottom : 0000000004dd4000 > alloc_top : 0000000008000000 > alloc_top_hi : 0000000008000000 > rmo_top : 0000000008000000 > ram_top : 0000000008000000 > instantiating rtas at 0x00000000074e6000... done > boot cpu hw idx 0000000000000000 > starting cpu hw idx 0000000000000002... done > copying OF device tree... > Building dt strings... > Building dt structure... > Device tree strings 0x0000000004ed5000 -> 0x0000000004ed661a > Device tree struct 0x0000000004ed7000 -> 0x0000000004ef0000 > Calling quiesce... > returning from prom_init > Phyp-dump disabled at boot time > Using pSeries machine description > Using 1TB segments > Found initrd at 0xc000000004a00000:0xc000000004dd3800 > console [udbg0] enabled > Partition configured for 4 cpus. > CPU maps initialized for 2 threads per core > Starting Linux PPC64 #1 SMP Fri Apr 24 09:45:10 UTC 2009 > ----------------------------------------------------- > ppc64_pft_size = 0x19 > physicalMemorySize = 0x80000000 > htab_hash_mask = 0x3ffff > ----------------------------------------------------- > Initializing cgroup subsys cpuset > Initializing cgroup subsys cpu > Linux version 2.6.30-rc1-autokern1 (root@xxxxxxxxxxxxxxxxxxxxxxxxxx) (gcc version 4.1.3 20070929 (prerelease) (Ubuntu 4.1.2-16ubuntu2)) #1 SMP Fri Apr 24 09:45:10 UTC 2009 > [boot]0012 Setup Arch > EEH: No capable adapters found > PPC64 nvram contains 15360 bytes > Zone PFN ranges: > DMA 0x00000000 -> 0x00080000 > Normal 0x00080000 -> 0x00080000 > Movable zone start PFN for each node > early_node_map[1] active PFN ranges > 1: 0x00000000 -> 0x00080000 > Could not find start_pfn for node 0 > [boot]0015 Setup Done > Built 2 zonelists in Node order, mobility grouping on. Total pages: 510976 > Policy zone: DMA > Kernel command line: root=/dev/VolGroup00/LogVol00 ro console=hvc0 autobench_args: root=/dev/sdb1 ABAT:1240566469 initcall_debug > NR_IRQS:512 > [boot]0020 XICS Init > [boot]0021 XICS Done > PID hash table entries: 4096 (order: 12, 32768 bytes) > clocksource: timebase mult[7d0000] shift[22] registered > Console: colour dummy device 80x25 > console handover: boot [udbg0] -> real [hvc0] > Lock dependency validator: Copyright (c) 2006 Red Hat, Inc., Ingo Molnar > ... MAX_LOCKDEP_SUBCLASSES: 8 > ... MAX_LOCK_DEPTH: 48 > ... MAX_LOCKDEP_KEYS: 8191 > ... CLASSHASH_SIZE: 4096 > ... MAX_LOCKDEP_ENTRIES: 8192 > ... MAX_LOCKDEP_CHAINS: 16384 > ... CHAINHASH_SIZE: 8192 > memory used by lock dependency info: 5119 kB > per task-struct memory footprint: 2688 bytes > allocated 20971520 bytes of page_cgroup > please try cgroup_disable=memory option if you don't want > freeing bootmem node 1 > Memory: 1967824k/2097152k available (9752k kernel code, 129328k reserved, 1420k data, 8422k bss, 2108k init) > Calibrating delay loop... 1021.95 BogoMIPS (lpj=2043904) > Dentry cache hash table entries: 262144 (order: 9, 2097152 bytes) > Inode-cache hash table entries: 131072 (order: 8, 1048576 bytes) > Mount-cache hash table entries: 256 > Unable to handle kernel paging request for data at address 0x00000008 > Faulting instruction address: 0xc000000000100524 > Oops: Kernel access of bad area, sig: 11 [#1] > SMP NR_CPUS=128 NUMA pSeries > Modules linked in: > NIP: c000000000100524 LR: c0000000001004f4 CTR: 0000000000000001 > REGS: c000000000aef690 TRAP: 0300 Not tainted (2.6.30-rc1-autokern1) > MSR: 8000000000009032 <EE,ME,IR,DR> CR: 48000082 XER: 00000005 > DAR: 0000000000000008, DSISR: 0000000042000000 > TASK = c0000000009bedd0[0] 'swapper' THREAD: c000000000aec000 CPU: 0 > GPR00: c00000007e001030 c000000000aef910 c000000000aeae68 c000000000b32680 > GPR04: c00000007e001000 c0000000009bf718 0000000000000002 c0000000009bf718 > GPR08: 000000000000001a 0000000000000001 0000000000000000 0000000000000001 > GPR12: 0000000088000084 c000000000b3b280 0000000000000000 0000000003500000 > GPR16: c0000000006b4088 c0000000006b27e8 0000000000000000 00000000003d3800 > GPR20: 0000000003cae870 c0000000007ae870 0000000000000010 0000000000000000 > GPR24: c000000000b4d6f0 f000000003347488 c000000000b32680 f000000003347488 > GPR28: c00000007e001180 c00000007e001000 c000000000a56788 f0000000033474a8 > NIP [c000000000100524] .__slab_alloc_page+0x380/0x3dc > LR [c0000000001004f4] .__slab_alloc_page+0x350/0x3dc > Call Trace: > [c000000000aef910] [c0000000001004f4] .__slab_alloc_page+0x350/0x3dc (unreliable) > [c000000000aef9d0] [c000000000100f14] .__remote_slab_alloc+0x60/0x138 > [c000000000aefa80] [c00000000010284c] .__kmalloc_track_caller+0xb4/0x23c > [c000000000aefb30] [c0000000000df258] .kstrdup+0x4c/0x13c > [c000000000aefbd0] [c000000000128504] .alloc_vfsmnt+0xb0/0x178 > [c000000000aefc70] [c00000000010dfdc] .vfs_kern_mount+0x40/0xf8 > [c000000000aefd10] [c0000000007954a4] .sysfs_init+0x90/0x108 > [c000000000aefdb0] [c00000000079409c] .mnt_init+0xbc/0x254 > [c000000000aefe50] [c000000000793afc] .vfs_caches_init+0x150/0x184 > [c000000000aefee0] [c000000000777a08] .start_kernel+0x420/0x48c > [c000000000aeff90] [c000000000008368] .start_here_common+0x1c/0x34 > Instruction dump: > 60000000 e93d0040 e97d0028 381d0030 7fa4eb78 e95d0030 7f43d378 39290001 > 396b0001 f93d0040 f97d0028 f95b0020 <fbea0008> fbfd0030 f81f0008 4bfffb59 > ---[ end trace 31fd0ba7d8756001 ]--- > Kernel panic - not syncing: Attempted to kill the idle task! > Call Trace: > [c000000000aef2f0] [c00000000000fcb8] .show_stack+0x70/0x184 (unreliable) > [c000000000aef3a0] [c00000000005ace8] .panic+0x80/0x1b4 > [c000000000aef440] [c00000000005f330] .do_exit+0x84/0x6f0 > [c000000000aef500] [c0000000000242ac] .die+0x24c/0x27c > [c000000000aef5a0] [c00000000002c154] .bad_page_fault+0xb8/0xd4 > [c000000000aef620] [c00000000000534c] handle_page_fault+0x3c/0x5c > --- Exception: 300 at .__slab_alloc_page+0x380/0x3dc > LR = .__slab_alloc_page+0x350/0x3dc > [c000000000aef9d0] [c000000000100f14] .__remote_slab_alloc+0x60/0x138 > [c000000000aefa80] [c00000000010284c] .__kmalloc_track_caller+0xb4/0x23c > [c000000000aefb30] [c0000000000df258] .kstrdup+0x4c/0x13c > [c000000000aefbd0] [c000000000128504] .alloc_vfsmnt+0xb0/0x178 > [c000000000aefc70] [c00000000010dfdc] .vfs_kern_mount+0x40/0xf8 > [c000000000aefd10] [c0000000007954a4] .sysfs_init+0x90/0x108 > [c000000000aefdb0] [c00000000079409c] .mnt_init+0xbc/0x254 > [c000000000aefe50] [c000000000793afc] .vfs_caches_init+0x150/0x184 > [c000000000aefee0] [c000000000777a08] .start_kernel+0x420/0x48c > [c000000000aeff90] [c000000000008368] .start_here_common+0x1c/0x34 > Rebooting in 180 seconds.. -- To unsubscribe from this list: send the line "unsubscribe linux-next" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html