sharyathi nagesh wrote: > Hi > I am seeing this problem with crash tool on a system with NUMA nodes. > crash exits with error message and no further analysis of dump is possible. > ===== > Error message: > > cassinilp1:~ # crash > > crash 4.0-3.14 > Copyright (C) 2002, 2003, 2004, 2005, 2006 Red Hat, Inc. > Copyright (C) 2004, 2005, 2006 IBM Corporation > Copyright (C) 1999-2006 Hewlett-Packard Co > Copyright (C) 2005 Fujitsu Limited > Copyright (C) 2005 NEC Corporation > Copyright (C) 1999, 2002 Silicon Graphics, Inc. > Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc. > This program is free software, covered by the GNU General Public License, > and you are welcome to change it and/or distribute copies of it under > certain conditions. Enter "help copying" to see the conditions. > This program has absolutely no warranty. Enter "help warranty" for details. > > GNU gdb 6.1 > Copyright 2004 Free Software Foundation, Inc. > GDB is free software, covered by the GNU General Public License, and you are > welcome to change it and/or distribute copies of it under certain conditions. > Type "show copying" to see the conditions. > There is absolutely no warranty for GDB. Type "show warranty" for details. > This GDB was configured as "powerpc64-unknown-linux-gnu"... > > crash: numnodes out of sync with pgdat_list? > > ===== > System configuration is given as > > Node 0 Memory: > Node 1 Memory: > Node 2 Memory: > Node 3 Memory: > Node 4 Memory: 0x0-0x180000000 > > Node 0 CPUs: 0 > Node 1 CPUs: > Node 2 CPUs: > Node 3 CPUs: > Node 4 CPUs: 1 > ===== > The problem is noticed because of mismatch: > > if (n != vt->numnodes) > error(FATAL, "numnodes out of sync with pgdat_list?\n"); > in memory.c/dump_memory_nodes() function > > The problem is because of the mismatch between node_online_map and the number of nodes observed by traversing through pgdat_list. > node_online_map bit is set differently in kernel version 2.6.16 and 2.6.19. > In earlier version all the bits from the first bit to > nth bit, where n is last Node to which memory is assigned is set to '1'. > But in later version node is considered online if either memory or cpu is allocated (or both). > > So I need your suggestion on how to go and fix the problem > A few ideas I had were > 1) If KERNEL_VERSION <= 2.6.16 set increment vt->numnodes only if bits of node_online_map and cpu_online_map are set. > if KERNEL_VERSIOn > 2.6.16 use only node_online_map > (This will partly solve the problem) > 2) or as in node_table_init(). Raise the error only when CRASHDEBUG(2) is set else update vt->numnodes with 'n' > > Please let me know of your opinion > Regards > Sharyathi Nagesh > Hi Sharyathi, Thanks a lot for debugging this. I prefer your idea (2) -- which if it works OK in your case -- will not break any other currently-working incarnations. Also, just to clarify, when you say "Raise the error...", node_table_init() only makes an "error(NOTE, ...)" call, so you would simply get a "NOTE: ..." message displayed if CRASHDEBUG(2), and the crash session would still continue. That's also what we would want in this case, unlike the "error(FATAL, ...)", session-ending, error that you're seeing now... Thanks, Dave -- Crash-utility mailing list Crash-utility@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/crash-utility