On Wed, 2010-02-10 at 14:01 -0500, Dave Anderson wrote: > ----- "Michael Holzheu" <holzheu@xxxxxxxxxxxxxxxxxx> wrote: > > > > > It shows all swapper tasks (online and offline), but I get errors for > > > > the backtrace for the offline CPUs. > > > > > > What kind of errors? > > > > The problem is that for the offline swapper tasks > > s390x_get_stack_frame() is called. In that function I check with > > s390x_has_cpu() if the task is currently running on a CPU. Because of > > the missing CPU online check, s390x_has_cpu() returns TRUE. Therefore I > > try to read the CPU registers from the lowcore of that CPU. The lowcore > > pointer is zero, because the CPU is offline. Therefore the read stack > > pointer (register 15) is wrong and the backtrace fails. > > > > > > > > > > The attached patch would solve the problem (and eliminate most of the > > > > probably redundant s390(x)_has_cpu() function. > > > > > > I don't see what's being solved by the patch (not the s390x_get_smp_cpus > > > parts) -- does the "old" s390x_has_cpu() fail? > > > > The old s390x_has_cpu() returns TRUE for the offline swapper tasks. And > > I think that this is wrong. > > Hmmm... To me, it is TRUE, i.e., the existing-but-idle swapper task for > an offline cpu actually *does* own that cpu. > > And that's why I was wondering about what error message gets shown. > > > > > The new implementation of s390x_has_cpu() should return TRUE if the task > > is running on a online CPU and FALSE otherwise: > > > > + if (is_task_active(bt->task) && (kt->cpu_flags[cpu] & ONLINE)) > > + return TRUE; > > + else > > + return FALSE; > > This is probably OK, although I am slightly hesitant about throwing out all > of the old backwards-compatibility code in the s390[x]_has_cpu() functions. Why? The "is_task_active()" function must also work on all supported kernel levels. Otherwise crash would probably fail in other s390 independent functions, wouldn't it? Of course, we could also keep my old code and add the online check to the old code. > I thought maybe it would be safer to leave well enough alone, and not > worry about any error messages from backtraces of offline cpus. > It might be even more useful that there are error messages to alert > the user that the cpu is not online? The following shows the output of "bt -a" without the patch: PID: 0 TASK: 18d38340 CPU: 2 COMMAND: "swapper" bt: invalid kernel virtual address: ffffffffffffc000 type: "async_stack" PID: 0 TASK: 18d40440 CPU: 3 COMMAND: "swapper" bt: invalid kernel virtual address: ffffffffffffc000 type: "async_stack" We can't leave it like that. With my patch at least we get a correct stack backtrace: PID: 0 TASK: 18d38340 CPU: 2 COMMAND: "swapper" #0 [18d3feb8] ret_from_fork at 117e12 How is the output of a backtrace of offline CPUs on other architectures? Michael -- Crash-utility mailing list Crash-utility@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/crash-utility