Hi Aditya, On Sun, Mar 17, 2024 at 03:07:44PM +0530, Aditya Gupta wrote: > Hi Tao, > > On Fri, Mar 15, 2024 at 08:33:31PM +0800, Tao Liu wrote: > > Hi Aditya, > > > > > > As we can see, other cpus[1-6, 8-23] just take the reg cache of > > > > cpu[7], which is incorrect. And if users go further like, "thread 20" > > > > and "gdb bt", it will also give incorrect stack traces. > > > > > > > > The cpu cache will only get refreshed once user type "set <pid>", so > > > > the cpu cache will be refreshed by the <pid> task's context. > > > > > > > > I doubt a user will understand all the details and constraints, so I'm > > > > afraid the user will be confused by the faulty output. But I also have > > > > no objection if the performance is the priority. Basically it is a > > > > balance of pays and gains. In addition, since cmd "info" and "thread" > > > > is a command provided by gdb, currently I don't know how to hack > > > > those, so cpu cache can be refreshed when "info threads" or "thread > > > > <num>" have been invoked. > > > > > > > > Do you have any thoughts? > > > > > > I also had faced that issue initially, ie. the other CPUs using up same > > > regcache, if all are not refreshed. > > > While iterating through all threads, gdb switches it's context > > > temporarily, while crash's context remained same, thus causing gdb to > > > get same registers for all threads other than 0. > > > > > > This was solved in patch #3 (synchronise cpu context changes between crash/gdb) > > > in the ppc's 'Improve stack unwind on ppc64' series, by syncing gdb's > > > context with crash. > > > > > > Can this change in thread.c in gdb-10.2.patch in patch #2 be reverted ? > > > That will fix it. > > > > Could you share your patch, based on your v10 and my v1 patch series, > > so I can get a clue how to do this? > > Sure tao, i will attach it to the end of this mail. > Basically what I did is to revert changes to gdb-10.2.patch in this > patch. I pushed it along with testing with only regcache_refresh for CPU > 0 instead of all CPUs, to: > > https://github.com/adi-g15-ibm/crash/tree/tmp-test-branch-10928 > I have tried with your repo, and I noticed the following behaviour, not sure if it is expected: cmd "info threads" will always reflush the cpus regcache to be the active tasks' right? E.g: crash> thread 15 [Switching to thread 15 (CPU 14)] #0 <unavailable> in ?? () crash> bt PID: 29867 TASK: ffff88025b04af70 CPU: 14 COMMAND: "elasticsearch[l" ... crash> gdb bt #0 <unavailable> in ?? () Backtrace stopped: not enough registers or memory available to unwind further <== pid 29867's regcache in CPU14 crash> set ffff88003592cf10 crash> bt PID: 835 TASK: ffff88003592cf10 CPU: 14 COMMAND: "kdmflush" #0 [ffff880fd6fc7da8] __schedule at ffffffff816a8f65 ... crash> gdb bt #0 0xffffffff816a8f65 in context_switch (rq=0x0, next=0x0, prev=0xffff88003592cf10) at kernel/sched/core.c:2527 #1 __schedule () at kernel/sched/core.c:3540 <== pid 835's regcache in CPU14 ... crash> info threads ... crash> bt PID: 29867 TASK: ffff88025b04af70 CPU: 14 COMMAND: "elasticsearch[l" ... crash> gdb bt #0 <unavailable> in ?? () Backtrace stopped: not enough registers or memory available to unwind further <== pid 29867's regcache in CPU14 Frankly I would expect the task context remains to be pid 835's after "info threads", because I previously typed the command "set XX" to switch to it, so I would assume the context stay unchange until I retype cmd "set YY". What do you think? Thanks, Tao Liu > > > > I tried but was unsuccessful. Since I have changed your #3 patch a bit > > in my v1 patch series, such as gdb_change_cpu_context() -> > > gdb_change_thread_context(), I doubt that's the reason for failing. > > > > What I did is keeping "set_cpu" in thread.c:thread_command() as the > > gdb-10.2.patch describes in your #3 patch. But only one thread gets > > refreshed when I invoke "thread X", and no regcache refreshed when > > invoke "info threads". > > If i understand clearly, "thread X" causing refresh for one thread/CPU > is expected, as we want only registers for "X" to be refreshed. > But 'info threads' not refreshing any regcache should be solved by the > restoring changes to gdb-10.2.patch to do the 'set_cpu' in the > thread_command. > > Thanks Tao, > > - Aditya Gupta > > commit d1ad22747de0b6c9846ecc8ea746ee9a38c7dade > Author: Tao Liu <ltao@xxxxxxxxxx> > Date: Wed Feb 14 10:44:54 2024 +0800 > > change thread context > > Previously we can only view the stack unwinding for the tasks which are > running on each CPUs. This patch will enable the ability to view > arbitrary tasks stack unwinding. > > After crash get initialized, "info threads" will output like the > following: > > crash> info threads > Id Target Id Frame > 1 CPU 0 native_safe_halt () at arch/x86/include/asm/irqflags.h:54 > ... > * 8 CPU 7 blk_mq_rq_timed_out (req=0xffff880fdb246000, reserved=reserved@entry=false) at block/blk-mq.c:640 > ... > 13 CPU 12 <unavailable> in ?? () > 14 CPU 13 native_safe_halt () at arch/x86/include/asm/irqflags.h:54 > ... > > crash> ps > PID PPID CPU TASK ST %MEM VSZ RSS COMM > > 0 0 0 ffffffff819f9480 RU 0.0 0 0 [swapper/0] > > 0 0 1 ffff880169411fa0 RU 0.0 0 0 [swapper/1] > ... > 0 0 23 ffff8801694e0000 RU 0.0 0 0 [swapper/23] > 1 0 13 ffff880169b30000 IN 0.0 193052 4180 systemd > > "info threads" show the tasks which are currently running on each CPU. If we'd > like to view systemd, which are not running, task's stack unwinding, we > do the following: > > crash> set 1 > or > crash> set ffff880169b30000 > > Then the register cache of systemd will be swapped into CPU 13: > > crash> info threads > crash> info threads > Id Target Id Frame > 1 CPU 0 native_safe_halt () at arch/x86/include/asm/irqflags.h:54 > ... > 8 CPU 7 blk_mq_rq_timed_out (req=0xffff880fdb246000, reserved=reserved@entry=false) at block/blk-mq.c:640 > ... > 13 CPU 12 <unavailable> in ?? () > * 14 CPU 13 0xffffffff816a8f65 in context_switch (rq=0x0, next=0x0, prev=0xffff880169b30000) at kernel/sched/core.c:2527 > ... > > And we can view the stack unwinding of systemd: > > crash> bt > PID: 1 TASK: ffff880169b30000 CPU: 13 COMMAND: "systemd" > #0 [ffff880169b3bd58] __schedule at ffffffff816a8f65 > #1 [ffff880169b3bdc0] schedule at ffffffff816a94e9 > #2 [ffff880169b3bdd0] schedule_hrtimeout_range_clock at ffffffff816a86fd > #3 [ffff880169b3be68] schedule_hrtimeout_range at ffffffff816a8733 > #4 [ffff880169b3be78] ep_poll at ffffffff8124bb7e > #5 [ffff880169b3bf30] sys_epoll_wait at ffffffff8124d00d > #6 [ffff880169b3bf80] system_call_fastpath at ffffffff816b5009 > RIP: 00007f0449407923 RSP: 00007ffc35a3c378 RFLAGS: 00010246 > RAX: 00000000000000e8 RBX: ffffffff816b5009 RCX: 0000000000000071 > RDX: 000000000000001d RSI: 00007ffc35a3d5a0 RDI: 0000000000000004 > RBP: 00007ffc35a3d810 R8: 0000000000000000 R9: 0000000000000000 > R10: 00000000ffffffff R11: 0000000000000293 R12: 0000563ca2ebe980 > R13: 0000000000000003 R14: ffffffffffffffff R15: 0000000000000001 > ORIG_RAX: 00000000000000e8 CS: 0033 SS: 002b > crash> gdb bt > #0 0xffffffff816a8f65 in context_switch (rq=0x0, next=0x0, prev=0xffff880169b30000) at kernel/sched/core.c:2527 > #1 __schedule () at kernel/sched/core.c:3540 > #2 0xffffffff816a94e9 in schedule () at kernel/sched/core.c:3577 > #3 0xffffffff816a86fd in schedule_hrtimeout_range_clock (expires=expires@entry=0x0, delta=delta@entry=0, mode=mode@entry=HRTIMER_MODE_ABS, clock=clock@entry=1) at kernel/hrtimer.c:1724 > #4 0xffffffff816a8733 in schedule_hrtimeout_range (expires=expires@entry=0x0, delta=delta@entry=0, mode=mode@entry=HRTIMER_MODE_ABS) at kernel/hrtimer.c:1778 > #5 0xffffffff8124bb7e in ep_poll (ep=0xffff880fd861f8c0, events=events@entry=0x7ffc35a3d5a0, maxevents=maxevents@entry=29, timeout=timeout@entry=-1) at fs/eventpoll.c:1669 > #6 0xffffffff8124d00d in SYSC_epoll_wait (timeout=<optimized out>, maxevents=29, events=<optimized out>, epfd=<optimized out>) at fs/eventpoll.c:2043 > #7 SyS_epoll_wait (epfd=<optimized out>, events=140721208415648, maxevents=29, timeout=4294967295) at fs/eventpoll.c:2008 > #8 <signal handler called> > #9 0x00007f0449407923 in ?? () > > Signed-off-by: Tao Liu <ltao@xxxxxxxxxx> > Signed-off-by: Aditya Gupta <adityag@xxxxxxxxxxxxx> > > diff --git a/crash_target.c b/crash_target.c > index d06383f594aa..1df1e9d34a45 100644 > --- a/crash_target.c > +++ b/crash_target.c > @@ -29,10 +29,10 @@ extern "C" int gdb_readmem_callback(unsigned long, void *, int, int); > extern "C" int crash_get_nr_cpus(void); > extern "C" int crash_get_cpu_reg (int cpu, int regno, const char *regname, > int regsize, void *val); > -extern "C" int gdb_change_cpu_context (unsigned int cpu); > extern "C" void gdb_refresh_regcache(unsigned int cpu); > extern "C" int set_cpu(int cpu, int print_context); > - > +extern "C" int crash_set_thread(ulong); > +extern "C" int gdb_change_thread_context (ulong task); > > /* The crash target. */ > > @@ -110,11 +110,13 @@ crash_target::xfer_partial (enum target_object object, const char *annex, > > #define CRASH_INFERIOR_PID 1 > > +crash_target *target = NULL; > + > void > crash_target_init (void) > { > int nr_cpus = crash_get_nr_cpus(); > - crash_target *target = new crash_target (); > + target = new crash_target (); > > /* Own the target until it is successfully pushed. */ > target_ops_up target_holder (target); > @@ -137,27 +139,33 @@ crash_target_init (void) > reinit_frame_cache (); > } > > -/* > - * Change gdb's thread context to the thread on given CPU > - **/ > extern "C" int > -gdb_change_cpu_context(unsigned int cpu) > +gdb_change_thread_context (ulong task) > { > + int tried = 0; > + inferior* inf = current_inferior (); > + int cpu = crash_set_thread(task); > + if (cpu < 0) > + return FALSE; > + > ptid_t ptid = ptid_t(CRASH_INFERIOR_PID, 0, cpu); > - inferior *inf = current_inferior (); > + > +retry: > thread_info *tp = find_thread_ptid (inf, ptid); > + if (tp == nullptr && !tried) { > + thread_info *thread = add_thread_silent(target, ptid_t(CRASH_INFERIOR_PID, 0, cpu)); > + tried++; > + if (thread) { > + goto retry; > + } > + } > > - if (tp == nullptr) > + if (tp == nullptr && tried) > return FALSE; > > - /* Making sure that crash's context is same */ > - set_cpu(cpu, FALSE); > - > - /* Switch to the thread */ > + target_fetch_registers(get_thread_regcache(tp), -1); > switch_to_thread(tp); > - > - /* Fetch/Refresh thread's registers */ > - gdb_refresh_regcache(cpu); > + reinit_frame_cache (); > > return TRUE; > } > diff --git a/defs.h b/defs.h > index 49b606979d9e..d5cef621b465 100644 > --- a/defs.h > +++ b/defs.h > @@ -8192,7 +8192,6 @@ enum ppc64_regnum { > }; > > /* crash_target.c */ > -extern int gdb_change_cpu_context (unsigned int cpu); > extern void gdb_refresh_regcache (unsigned int cpu); > > #endif /* !GDB_COMMON */ > diff --git a/kernel.c b/kernel.c > index ea5b5cb32914..50832ed906e5 100644 > --- a/kernel.c > +++ b/kernel.c > @@ -6544,6 +6544,29 @@ set_cpu(int cpu, int print_context) > show_context(CURRENT_CONTEXT()); > } > > +int > +crash_set_thread(ulong task) > +{ > + bool found = FALSE; > + struct task_context *tc = FIRST_CONTEXT(); > + > + for (int i = 0; i < RUNNING_TASKS(); i++, tc++) { > + if (tc->task == task) { > + found = TRUE; > + break; > + } > + } > + > + if (!found) > + return -1; > + > + if (CURRENT_TASK() == tc->task) > + return tc->processor; > + > + set_context(tc->task, NO_PID); > + return tc->processor; > +} > + > > /* > * Collect the irq_desc[] entry along with its associated handler and > diff --git a/task.c b/task.c > index a405b05a47d1..ef79f533f11a 100644 > --- a/task.c > +++ b/task.c > @@ -715,7 +715,8 @@ task_init(void) > * crash_target::fetch_registers, so CPU 0's registers are shown as > * <unavailable> in gdb mode > * */ > - gdb_refresh_regcache(0); > + for (int i = 0; i < get_cpus_online(); i++) > + gdb_refresh_regcache(i); > > tt->flags |= TASK_INIT_DONE; > } > @@ -5315,7 +5316,7 @@ set_context(ulong task, ulong pid, uint update_gdb_thread) > > /* change the selected thread in gdb, according to current context */ > if (update_gdb_thread) > - return gdb_change_cpu_context(tc->processor); > + return gdb_change_thread_context(tc->task); > else > return TRUE; > } else { > -- Crash-utility mailing list -- devel@xxxxxxxxxxxxxxxxxxxxxxxxxxx To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxxxxxxxxxxxx https://${domain_name}/admin/lists/devel.lists.crash-utility.osci.io/ Contribution Guidelines: https://github.com/crash-utility/crash/wiki