On Mon, 2010-03-08 at 09:49 -0500, Dave Anderson wrote: > ----- "Dave Anderson" <anderson@xxxxxxxxxx> wrote: > > > ----- "Luciano Chavez" <lnx1138@xxxxxxxxxxxxxxxxxx> wrote: > > > > > Hi Dave, > > > > > > Thinking about backward compatibility, would displaying "ONLINE CPUS" > > > still seem OK for the case where kernel_init() finds the smp_num_cpus > > > symbol (as for a 2.4 kernel)? Before there were the various cpu maps, I > > > think smp_num_cpus was analogous to the possible cpus as opposed to > > > online. I can see this requiring some thought as to what CPUS in the > > > output means when you have various different maps now (online, possible, > > > and present). That being said, it would be good to leave no doubt and > > > explicitly state the count is for the present or online CPUS with the > > > latter being my suggestion. > > > > > > I forgot to mention that I suspect the problem I mentioned before would > > > get stranger for POWER7 which offers 4 threads per core. I didn't have > > > access to a POWER7 machine to see just what it would do if we tried > > > disabling SMT as before but it follows the same pattern the count > > > displayed would be way off from the online count. > > > > I just ran through a bunch of stashed dumpfiles I have on hand, and > > it gets even murkier when dealing with Xen or KVM kernels, because > > as part of the post-crash shutdown (or forced dump), all but one of > > the cpus may be taken "offline". So even though there may be 4 vcpus, > > and crash correctly shows 4 "CPUS", the cpu_online_map shows only one > > cpu bit. So if we went ahead and displayed a number based upon the > > cpu_online_map, it would completely misleading. Incorrect > > actually... > > You can always dump the possible/present/online map information with > the "help -k" debug option. > > So for example, taking a 2.6.9-era (RHEL4) xen kernel that crashed > on vcpu 3 due to a NULL reference, the hypervisor made a callback to > the other vcpus to shut them down prior to the core dumping procedure: > > crash> help -k > ... > cpu_possible_map: (does not exist) > cpu_present_map: 0 1 2 3 > cpu_online_map: 3 > ... > > So the online map cannot be used for the cpu count, and for that > matter, it wouldn't make any sense to even display the online map > count. > > In any case, for now I prefer not to change things, at least for the > other architectures. > > That being said, I defer machine-specific items for ppc64, s390 > and s390x to the IBM maintainers, and to HP for ia64. (The ppc > and alpha architectures have no active "maintainers" any more, > so those arches are pretty much withering on the vine.) > > So if you want to do something specifically for ppc64, please > re-post a patch for just that architecture. > > Dave > Dave, Thanks for taking a good look at all the many cases that would make a general solution of using online cpu count messy. I originally did want to make this change only applicable to ppc64. The thing was, only ppc64_display_machine_stats() was possible to affect and to make the value displayed consistent, changing display_sys_stats() and dump_kernel_table() was necessary. So, re-thinking this to be a ppc64 specific change to CPUS to be displayed as the online count when possible and having everyone else do what they do now, which is to display kt->cpus, I suggest the following: 1. Add a get_cpus_to_display as a machdep function 2. For ppc64, initialize machdep->get_cpus_to_display to ppc64_get_cpus_to_display() which will attempt to use get_cpus_online() or fallback to using kt->cpus 3. For all other architectures, have them initialize machdep->get_cpus_to_display to generic_get_cpus_to_display() which returns kt->cpus to maintain the status quo of the code as it is now 4. Replace kt->cpus in display_sys_stats() and dump_kernel_table() in kernel.c to invoke machdep->get_cpus_to_display() when displaying CPUS Let me know what you think. I think this solution allows for future flexibility for other architectures if in the future they individually need to change what they display for the cpu count. regards, -- Luciano Chavez <lnx1138@xxxxxxxxxxxxxxxxxx> IBM Linux Technology Center -- Crash-utility mailing list Crash-utility@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/crash-utility