Re: Problem with NUMA Nodes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Dave
Thanks for the feed back. I am attaching the patch as per out discussion, tested and it is working. Have a look at it and let me know of your opinion.
Thanks
Sharyathi N

Dave Anderson wrote:
sharyathi nagesh wrote:

Hi
    I am seeing this problem with crash tool on a system with NUMA nodes.
crash exits with error message and no further analysis of dump is possible.
=====
Error message:

cassinilp1:~ # crash

crash 4.0-3.14
Copyright (C) 2002, 2003, 2004, 2005, 2006  Red Hat, Inc.
Copyright (C) 2004, 2005, 2006  IBM Corporation
Copyright (C) 1999-2006  Hewlett-Packard Co
Copyright (C) 2005  Fujitsu Limited
Copyright (C) 2005  NEC Corporation
Copyright (C) 1999, 2002  Silicon Graphics, Inc.
Copyright (C) 1999, 2000, 2001, 2002  Mission Critical Linux, Inc.
This program is free software, covered by the GNU General Public License,
and you are welcome to change it and/or distribute copies of it under
certain conditions.  Enter "help copying" to see the conditions.
This program has absolutely no warranty.  Enter "help warranty" for details.

GNU gdb 6.1
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "powerpc64-unknown-linux-gnu"...

crash: numnodes out of sync with pgdat_list?

=====
System configuration is given as

Node 0 Memory:
Node 1 Memory:
Node 2 Memory:
Node 3 Memory:
Node 4 Memory: 0x0-0x180000000

Node 0 CPUs: 0
Node 1 CPUs:
Node 2 CPUs:
Node 3 CPUs:
Node 4 CPUs: 1
=====
The problem is noticed because of mismatch:

 if (n != vt->numnodes)
                error(FATAL, "numnodes out of sync with pgdat_list?\n");
in memory.c/dump_memory_nodes() function

        The problem is because of the mismatch between node_online_map and the number of nodes observed by traversing through pgdat_list.
node_online_map bit is set differently in kernel version 2.6.16 and 2.6.19.
        In earlier version all the bits from the first bit to
nth bit, where n is last Node to which memory is assigned is set to '1'.
        But in later version node is considered online if either memory or cpu is allocated (or both).

So I need your suggestion on how to go and fix the problem
A few ideas I had were
1) If KERNEL_VERSION <= 2.6.16 set increment vt->numnodes only if bits of node_online_map and cpu_online_map are set.
   if KERNEL_VERSIOn > 2.6.16 use only node_online_map
        (This will partly solve the problem)
2) or as in node_table_init(). Raise the error only when CRASHDEBUG(2) is set else update vt->numnodes with 'n'

Please let me know of your opinion
Regards
Sharyathi Nagesh


Hi Sharyathi,

Thanks a lot for debugging this.

I prefer your idea (2) -- which if it works OK in your case -- will not break
any other currently-working incarnations.

Also, just to clarify, when you say "Raise the error...", node_table_init()
only makes an "error(NOTE, ...)" call, so you would simply get a "NOTE: ..."
message displayed if CRASHDEBUG(2), and the crash session would
still continue.  That's also what we would want in this case, unlike the
"error(FATAL, ...)", session-ending, error that you're seeing now...

Thanks,
  Dave


--
Crash-utility mailing list
Crash-utility@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/crash-utility

Index: crash-4.0-3.21/memory.c
===================================================================
--- crash-4.0-3.21.orig/memory.c	2007-03-16 14:35:05.000000000 -0500
+++ crash-4.0-3.21/memory.c	2007-05-02 00:43:55.000000000 -0500
@@ -11667,8 +11667,12 @@
 		}
 	} 
 
-	if (n != vt->numnodes)
-		error(FATAL, "numnodes out of sync with pgdat_list?\n");
+	if (n != vt->numnodes){
+		if (CRASHDEBUG(2))
+                                error(NOTE, "changing numnodes from %d to %d\n",
+                                       vt->numnodes, n);
+                 vt->numnodes = n;
+	}
 
 	if (!initialize && IS_SPARSEMEM())
 		dump_mem_sections();
--
Crash-utility mailing list
Crash-utility@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/crash-utility

[Index of Archives]     [Fedora Development]     [Fedora Desktop]     [Fedora SELinux]     [Yosemite News]     [KDE Users]     [Fedora Tools]

 

Powered by Linux