----- "Jeffrey Hagen" <Jeffrey.Hagen@xxxxxxxxxxxx> wrote: > Hi Petr and Dave, > > I have a couple of comments on Petr's email regarding CPU count. > > When the dump is the result of an NMI (nmi switch pressed) due to a hung > system, one often needs to analyze the state and backtrace for all the > CPU's. Since the kernel halts all but CPU0, the crash utility cannot > see the other "offline" CPU's. I've never seen that behavior before. Probably because I've never seen an x86_64 dumpfile that was created as a result of the NMI switch being pressed? Anyway, are you saying that the NMI switch shutdown handler takes the other cpus offline? > This behavior has changed for the x86 architecture somewhere between > 2.6.16 (SLES10) and 2.6.32 (SLES11) due to the removal of the x8664_pda > structure. > The function x86_64_init (in x86_64.c) now calls x86_64_per_cpu_init > which doesn't count the offline CPUS when calculating the number of > CPU's. Previously, x86_64_cpu_pda_init (called if x8664_pda exists), > didn't check for online/offline status. Again -- I've never seen this behaviour before. In any case, I'll look at any patch suggestions you guys have in mind. Thanks, Dave > Regarding #3 in Petr's email. It appears that the set command won't > accept a value >= kt_cpus (number of CPUS). It doesn't check if the CPU > is offline or not. > > Thanks, > > Jeff Hagen > > > > > > > Hi all, > > > > before making a larger cleanup, I want to ask here for your > opinion. > It > > seems that there is quite a bit of confusion about the meaning of > CPU > > count printed out by the crash utility. > > > > 1. Number of CPUs > > > > Some people think that crash should always output the number of > CPUs > in > > the system (ie. a quad-core server should always output 'CPUS: 4'), > > while other people think that only online CPUs should be counted. > > > > 2. CPU numbering > > > > For example, if there are 4 CPUs in the system, but some of them > are > > taken offline (e.g. CPU 1 and CPU 3), _and_ crash output the number > of > > online CPUs, it would print out 'CPUS: 2'. It's not easy to find > out > > that valid CPU numbers are 0 and 2 in this case. > > Hi Petr, > > For all but ppc64, the number shown by the initial banner and the > "sys" command is essentially "the-highest-cpu-number-plus-one". > For ppc64 (as requested and implemented by the IBM/ppc64 > maintainers), > it shows the number of online cpus. There's reasons for doing it > either of the two ways, but I'm on vacation now, and you can research > the list archives for the various arguments for-and-against doing it > either way. Check the changelog.html for when it was changed for > ppc64, and then cross-reference the revision date with the list > archives. > > > 3. Examining offline CPU > > > > Sometimes, it may be useful to examine the state of an offline CPU. > Now, > > I know that the saved state is most likely stale, but it can be > useful > > in some cases (e.g. a crash after dropping to kdb). The crash > utility > > currently refuses to select an offline CPU with 'set -c #'. Are > there > > any concerns about allowing it? > > I tend to agree with you, but the only thing that's useful and > available from an offline cpu is the swapper task for that cpu > and the runqueue for that cpu. And both of those entities are > readily accessible if you really need them. Although I don't know > anything about kdb status, so maybe there's something of per-cpu > interest, but I don't know why it would be necessary to "set" > that cpu? > > In any case, like I said before, I'm just temporarily online while > on vacation, and will be back to work on the 9th. > > Thanks, > Dave > > -- > Crash-utility mailing list > Crash-utility@xxxxxxxxxx > https://www.redhat.com/mailman/listinfo/crash-utility -- Crash-utility mailing list Crash-utility@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/crash-utility