On Thu, 2005-10-27 at 14:16 -0400, Dave Anderson wrote: > Badari Pulavarty wrote: > > On Thu, 2005-10-27 at 13:17 -0400, Dave Anderson wrote: > > > Badari Pulavarty wrote: > > > > > That debug output certainly seems to pinpoint the issue at > > hand, > > > > doesn't it? > > > > > Very interesting... > > > > > > > > > > What's strange is that the usage of the cpu_pda[i].data_offset > > by > > > > the > > > > > per_cpu() macro in "include/asm-x86_64/percpu.h" is > > unchanged. > > > > > > > > > > It's probably something very simple going on here, but I > > don't > > > > have > > > > > any more ideas at this point. > > > > > > > > This is the reply I got from Andi Kleen.. > > > > > > > > -------- Forwarded Message -------- > > > > From: Andi Kleen <ak@xxxxxxx> > > > > To: Badari Pulavarty <pbadari@xxxxxxxxxx> > > > > Subject: Re: cpu_pda->data_offset changed recently ? > > > > Date: Thu, 27 Oct 2005 16:58:54 +0200 > > > > On Thursday 27 October 2005 16:53, Badari Pulavarty wrote: > > > > > Hi Andi, > > > > > > > > > > I am trying to fix "crash" utility to make it work on 2.6.14- > > rc5. > > > > > (Its running fine on 2.6.10). It looks like crash utility > > reads > > > > > and uses cpu_pda->data_offset values. It looks like there is > > a > > > > > change between 2.6.10 & 2.6.14-rc5 which is causing > > "data_offset" > > > > > to be huge values - which is causing "crash" to break. > > > > > > > > > > I added printk() to find out why ? As you can see from > > following > > > > > what changed - Is this expected ? Please let me know. > > > > > > > > bootmem used to allocate from the end of the direct mapping on > > NUMA > > > > systems. Now it starts at the beginning, often before the > > > > kernel .text. > > > > This means it is negative. Perfectly legitimate. crash just has > > to > > > > handle it. > > > > > > > > -Andi > > > > > > > > -- > > > > > > > That's what I thought it looked like, although the > > > x8664_pda.data_offset > > > field is an "unsigned long". Anyway, if you take any of the > > > per_cpu__xxx > > > symbols from the 2.6.14 kernel, subtract a cpu data_offset, does > > it > > > come up > > > with a legitimate virtual address? > > > > Unfortunately, I don't know x86-64 kernel virtual address space > > well enough to answer your question. > > > > My understanding is x86-64 kernel addresses look something like: > > > > addr: ffffffff80101000 > > > > But now (2.6.14-rc5) I do see address like: > > > > pgdat: 0xffff81000000e000 > > > > which are causing read problems. > > > > crash: read error: kernel virtual address: ffff81000000fa90 type: > > "pglist_data node_next" > > > > I am not sure what these address are and if they are valid. > > Is there a way to verify these addresses, through gdb or /dev/kmem > > or something like that ? > > > > Thanks, > > Badari > > > > Here is bottom line we need to understand to fix > > the problem. > > > > 2.6.10: > > pgdat: 0x1000000e000 > > > > 2.6.14-rc5: > > pgdat: 0xffff81000000e000 > > > Exactly. > > On a 2.6.9 kernel, if you do an nm -Bn on the vmlinux file, you'll > first > see a bunch of "A" type absolute symbols, followed by the text > symbols, then readonly data, data, and so on. Eventually you'll > bump into the per-cpu symbols: > > $ nm -Bn vmlinux > 0000000000088861 A __crc_dev_mc_delete > 000000000014bfd1 A __crc_smp_call_function > 00000000002de2e0 A __crc___skb_linearize > 0000000000442f14 A __crc_tty_register_device > 000000000060e766 A __crc_tty_termios_baud_rate > 0000000000712c54 A __crc_remove_inode_hash > 00000000007f8e0b A __crc_xfrm_policy_alloc > 0000000000801678 A __crc_flush_scheduled_work > 0000000000a64d75 A __crc_neigh_changeaddr > ... <snip> > 00000000ffdf0b3d A __crc_usb_driver_release_interface > 00000000ffe031fc A __crc_udp_proc_unregister > 00000000ffead192 A __crc_cdrom_number_of_slots > 00000000fff9536b A __crc_sock_no_recvmsg > 00000000fffb8df8 A __crc_device_unregister > ffffffff80100000 t startup_32 > ffffffff80100000 A _text > ffffffff80100081 t reach_compatibility_mode > ffffffff8010008e t second > ffffffff80100100 t reach_long64 > ffffffff8010013d T initial_code > ffffffff80100145 T init_rsp > ffffffff80100150 T no_long_mode > ffffffff80100f00 T pGDT32 > ffffffff80100f10 t ljumpvector > ffffffff80100f18 T stext > ffffffff80100f18 T _stext > ffffffff80101000 T init_level4_pgt > ffffffff80102000 T level3_ident_pgt > ... <snip> > ffffffff80502100 D per_cpu__init_tss > ffffffff80502200 d per_cpu__prof_old_multiplier > ffffffff80502204 d per_cpu__prof_multiplier > ffffffff80502208 d per_cpu__prof_counter > ffffffff80502220 D per_cpu__mmu_gathers > ffffffff80503280 D per_cpu__kstat > ffffffff80503680 d per_cpu__runqueues > ffffffff805048e0 d per_cpu__cpu_domains > ffffffff80504940 d per_cpu__phys_domains > ffffffff805049a0 d per_cpu__node_domains > ffffffff805049f8 D per_cpu__process_counts > ffffffff80504a00 d per_cpu__tasklet_hi_vec > ffffffff80504a08 d per_cpu__tasklet_vec > ffffffff80504a10 d per_cpu__ksoftirqd > ffffffff80504a80 d per_cpu__tvec_bases > ffffffff80506b00 D per_cpu__rcu_bh_data > ffffffff80506b60 D per_cpu__rcu_data > ffffffff80506bc0 d per_cpu__rcu_tasklet > ... > > So for any data that was specifically created per-cpu, > the symbol above is the starting point, but to get to > the per-cpu structure, the offset value from the > cpu_data.data_offset needs to be applied. > > What I don't understand is where the 0xffff810000000000 > addresses come into play. Are you seeing them as actual > symbols? > > Dave It looks like level page table changed the layout. Now, 0xffff810000000000 is a valid. Documentation/x86_64/mm.txt Virtual memory map with 4 level page tables: 0000000000000000 - 00007fffffffffff (=47bits) user space, different per mm hole caused by [48:63] sign extension ffff800000000000 - ffff80ffffffffff (=40bits) guard hole ffff810000000000 - ffffc0ffffffffff (=46bits) direct mapping of phys. memory ffffc10000000000 - ffffc1ffffffffff (=40bits) hole ffffc20000000000 - ffffe1ffffffffff (=45bits) vmalloc/ioremap space ... unused hole ... ffffffff80000000 - ffffffff82800000 (=40MB) kernel text mapping, from phys 0 ... unused hole ... ffffffff88000000 - fffffffffff00000 (=1919MB) module mapping space Thanks, Badari