[Crash-utility] Re: [PATCH 2/5] Enable crash to change gdb thread context

Tao Liu <ltao@xxxxxxxxxx> · Mon, 18 Mar 2024 17:18:56 +0800

Hi Aditya,

On Sun, Mar 17, 2024 at 03:07:44PM +0530, Aditya Gupta wrote:
> Hi Tao,
> 
> On Fri, Mar 15, 2024 at 08:33:31PM +0800, Tao Liu wrote:
> > Hi Aditya,
> > 
> > > > As we can see, other cpus[1-6, 8-23] just take the reg cache of
> > > > cpu[7], which is incorrect. And if users go further like, "thread 20"
> > > > and "gdb bt", it will also give incorrect stack traces.
> > > >
> > > > The cpu cache will only get refreshed once user type "set <pid>", so
> > > > the cpu cache will be refreshed by the <pid> task's context.
> > > >
> > > > I doubt a user will understand all the details and constraints, so I'm
> > > > afraid the user will be confused by the faulty output. But I also have
> > > > no objection if the performance is the priority. Basically it is a
> > > > balance of pays and gains. In addition, since cmd "info" and "thread"
> > > > is a command provided by gdb, currently I don't know how to hack
> > > > those, so cpu cache can be refreshed when "info threads" or "thread
> > > > <num>" have been invoked.
> > > >
> > > > Do you have any thoughts?
> > >
> > > I also had faced that issue initially, ie. the other CPUs using up same
> > > regcache, if all are not refreshed.
> > > While iterating through all threads, gdb switches it's context
> > > temporarily, while crash's context remained same, thus causing gdb to
> > > get same registers for all threads other than 0.
> > >
> > > This was solved in patch #3 (synchronise cpu context changes between crash/gdb)
> > > in the ppc's 'Improve stack unwind on ppc64' series, by syncing gdb's
> > > context with crash.
> > >
> > > Can this change in thread.c in gdb-10.2.patch in patch #2 be reverted ?
> > > That will fix it.
> > 
> > Could you share your patch, based on your v10 and my v1 patch series,
> > so I can get a clue how to do this?
> 
> Sure tao, i will attach it to the end of this mail.
> Basically what I did is to revert changes to gdb-10.2.patch in this
> patch. I pushed it along with testing with only regcache_refresh for CPU
> 0 instead of all CPUs, to:
> 
> https://github.com/adi-g15-ibm/crash/tree/tmp-test-branch-10928
> 
I have tried with your repo, and I noticed the following behaviour, not
sure if it is expected:

cmd "info threads" will always reflush the cpus regcache to be the active
tasks' right? E.g:

crash> thread 15
[Switching to thread 15 (CPU 14)]
#0  <unavailable> in ?? ()
crash> bt
PID: 29867    TASK: ffff88025b04af70  CPU: 14   COMMAND: "elasticsearch[l"
...
crash> gdb bt
#0  <unavailable> in ?? ()
Backtrace stopped: not enough registers or memory available to unwind further <== pid 29867's regcache in CPU14
crash> set ffff88003592cf10
crash> bt
PID: 835      TASK: ffff88003592cf10  CPU: 14   COMMAND: "kdmflush"
 #0 [ffff880fd6fc7da8] __schedule at ffffffff816a8f65
...
crash> gdb bt
#0  0xffffffff816a8f65 in context_switch (rq=0x0, next=0x0, prev=0xffff88003592cf10) at kernel/sched/core.c:2527
#1  __schedule () at kernel/sched/core.c:3540                                 <== pid 835's regcache in CPU14
...
crash> info threads
...
crash> bt
PID: 29867    TASK: ffff88025b04af70  CPU: 14   COMMAND: "elasticsearch[l"
...
crash> gdb bt
#0  <unavailable> in ?? ()
Backtrace stopped: not enough registers or memory available to unwind further <== pid 29867's regcache in CPU14

Frankly I would expect the task context remains to be pid 835's after
"info threads", because I previously typed the command "set XX" to switch to
it, so I would assume the context stay unchange until I retype cmd "set YY".

What do you think?

Thanks,
Tao Liu

> > 
> > I tried but was unsuccessful. Since I have changed your #3 patch a bit
> > in my v1 patch series, such as gdb_change_cpu_context() ->
> > gdb_change_thread_context(), I doubt that's the reason for failing.
> > 
> > What I did is keeping "set_cpu" in thread.c:thread_command() as the
> > gdb-10.2.patch describes in your #3 patch. But only one thread gets
> > refreshed when I invoke "thread X", and no regcache refreshed when
> > invoke "info threads".
> 
> If i understand clearly, "thread X" causing refresh for one thread/CPU
> is expected, as we want only registers for "X" to be refreshed.
> But 'info threads' not refreshing any regcache should be solved by the
> restoring changes to gdb-10.2.patch to do the 'set_cpu' in the
> thread_command.
> 
> Thanks Tao,
> 
> - Aditya Gupta
> 
> commit d1ad22747de0b6c9846ecc8ea746ee9a38c7dade
> Author: Tao Liu <ltao@xxxxxxxxxx>
> Date:   Wed Feb 14 10:44:54 2024 +0800
> 
>     change thread context
>     
>     Previously we can only view the stack unwinding for the tasks which are
>     running on each CPUs. This patch will enable the ability to view
>     arbitrary tasks stack unwinding.
>     
>     After crash get initialized, "info threads" will output like the
>     following:
>     
>     crash> info threads
>       Id   Target Id         Frame
>       1    CPU 0             native_safe_halt () at arch/x86/include/asm/irqflags.h:54
>     ...
>     * 8    CPU 7             blk_mq_rq_timed_out (req=0xffff880fdb246000, reserved=reserved@entry=false) at block/blk-mq.c:640
>     ...
>       13   CPU 12            <unavailable> in ?? ()
>       14   CPU 13            native_safe_halt () at arch/x86/include/asm/irqflags.h:54
>     ...
>     
>     crash> ps
>           PID    PPID  CPU       TASK        ST  %MEM      VSZ      RSS  COMM
>     >       0       0   0  ffffffff819f9480  RU   0.0        0        0  [swapper/0]
>     >       0       0   1  ffff880169411fa0  RU   0.0        0        0  [swapper/1]
>     ...
>             0       0  23  ffff8801694e0000  RU   0.0        0        0  [swapper/23]
>             1       0  13  ffff880169b30000  IN   0.0   193052     4180  systemd
>     
>     "info threads" show the tasks which are currently running on each CPU. If we'd
>     like to view systemd, which are not running, task's stack unwinding, we
>     do the following:
>     
>     crash> set 1
>     or
>     crash> set ffff880169b30000
>     
>     Then the register cache of systemd will be swapped into CPU 13:
>     
>     crash> info threads
>     crash> info threads
>       Id   Target Id         Frame
>       1    CPU 0             native_safe_halt () at arch/x86/include/asm/irqflags.h:54
>     ...
>       8    CPU 7             blk_mq_rq_timed_out (req=0xffff880fdb246000, reserved=reserved@entry=false) at block/blk-mq.c:640
>     ...
>       13   CPU 12            <unavailable> in ?? ()
>     * 14   CPU 13            0xffffffff816a8f65 in context_switch (rq=0x0, next=0x0, prev=0xffff880169b30000) at kernel/sched/core.c:2527
>     ...
>     
>     And we can view the stack unwinding of systemd:
>     
>     crash> bt
>     PID: 1        TASK: ffff880169b30000  CPU: 13   COMMAND: "systemd"
>      #0 [ffff880169b3bd58] __schedule at ffffffff816a8f65
>      #1 [ffff880169b3bdc0] schedule at ffffffff816a94e9
>      #2 [ffff880169b3bdd0] schedule_hrtimeout_range_clock at ffffffff816a86fd
>      #3 [ffff880169b3be68] schedule_hrtimeout_range at ffffffff816a8733
>      #4 [ffff880169b3be78] ep_poll at ffffffff8124bb7e
>      #5 [ffff880169b3bf30] sys_epoll_wait at ffffffff8124d00d
>      #6 [ffff880169b3bf80] system_call_fastpath at ffffffff816b5009
>         RIP: 00007f0449407923  RSP: 00007ffc35a3c378  RFLAGS: 00010246
>         RAX: 00000000000000e8  RBX: ffffffff816b5009  RCX: 0000000000000071
>         RDX: 000000000000001d  RSI: 00007ffc35a3d5a0  RDI: 0000000000000004
>         RBP: 00007ffc35a3d810   R8: 0000000000000000   R9: 0000000000000000
>         R10: 00000000ffffffff  R11: 0000000000000293  R12: 0000563ca2ebe980
>         R13: 0000000000000003  R14: ffffffffffffffff  R15: 0000000000000001
>         ORIG_RAX: 00000000000000e8  CS: 0033  SS: 002b
>     crash> gdb bt
>      #0  0xffffffff816a8f65 in context_switch (rq=0x0, next=0x0, prev=0xffff880169b30000) at kernel/sched/core.c:2527
>      #1  __schedule () at kernel/sched/core.c:3540
>      #2  0xffffffff816a94e9 in schedule () at kernel/sched/core.c:3577
>      #3  0xffffffff816a86fd in schedule_hrtimeout_range_clock (expires=expires@entry=0x0, delta=delta@entry=0, mode=mode@entry=HRTIMER_MODE_ABS, clock=clock@entry=1) at kernel/hrtimer.c:1724
>      #4  0xffffffff816a8733 in schedule_hrtimeout_range (expires=expires@entry=0x0, delta=delta@entry=0, mode=mode@entry=HRTIMER_MODE_ABS) at kernel/hrtimer.c:1778
>      #5  0xffffffff8124bb7e in ep_poll (ep=0xffff880fd861f8c0, events=events@entry=0x7ffc35a3d5a0, maxevents=maxevents@entry=29, timeout=timeout@entry=-1) at fs/eventpoll.c:1669
>      #6  0xffffffff8124d00d in SYSC_epoll_wait (timeout=<optimized out>, maxevents=29, events=<optimized out>, epfd=<optimized out>) at fs/eventpoll.c:2043
>      #7  SyS_epoll_wait (epfd=<optimized out>, events=140721208415648, maxevents=29, timeout=4294967295) at fs/eventpoll.c:2008
>      #8  <signal handler called>
>      #9  0x00007f0449407923 in ?? ()
>     
>     Signed-off-by: Tao Liu <ltao@xxxxxxxxxx>
>     Signed-off-by: Aditya Gupta <adityag@xxxxxxxxxxxxx>
> 
> diff --git a/crash_target.c b/crash_target.c
> index d06383f594aa..1df1e9d34a45 100644
> --- a/crash_target.c
> +++ b/crash_target.c
> @@ -29,10 +29,10 @@ extern "C" int gdb_readmem_callback(unsigned long, void *, int, int);
>  extern "C" int crash_get_nr_cpus(void);
>  extern "C" int crash_get_cpu_reg (int cpu, int regno, const char *regname,
>                                    int regsize, void *val);
> -extern "C" int gdb_change_cpu_context (unsigned int cpu);
>  extern "C" void gdb_refresh_regcache(unsigned int cpu);
>  extern "C" int set_cpu(int cpu, int print_context);
> -
> +extern "C" int crash_set_thread(ulong);
> +extern "C" int gdb_change_thread_context (ulong task);
>  
>  /* The crash target.  */
>  
> @@ -110,11 +110,13 @@ crash_target::xfer_partial (enum target_object object, const char *annex,
>  
>  #define CRASH_INFERIOR_PID 1
>  
> +crash_target *target = NULL;
> +
>  void
>  crash_target_init (void)
>  {
>    int nr_cpus = crash_get_nr_cpus();
> -  crash_target *target = new crash_target ();
> +  target = new crash_target ();
>  
>    /* Own the target until it is successfully pushed.  */
>    target_ops_up target_holder (target);
> @@ -137,27 +139,33 @@ crash_target_init (void)
>    reinit_frame_cache ();
>  }
>  
> -/*
> - * Change gdb's thread context to the thread on given CPU
> - **/
>  extern "C" int
> -gdb_change_cpu_context(unsigned int cpu)
> +gdb_change_thread_context (ulong task)
>  {
> +  int tried = 0;
> +  inferior* inf = current_inferior ();
> +  int cpu = crash_set_thread(task);
> +  if (cpu < 0)
> +    return FALSE;
> +
>    ptid_t ptid = ptid_t(CRASH_INFERIOR_PID, 0, cpu);
> -  inferior *inf = current_inferior ();
> +
> +retry:
>    thread_info *tp = find_thread_ptid (inf, ptid);
> +  if (tp == nullptr && !tried) {
> +    thread_info *thread = add_thread_silent(target, ptid_t(CRASH_INFERIOR_PID, 0, cpu));
> +    tried++;
> +    if (thread) {
> +      goto retry;
> +    }
> +  }
>  
> -  if (tp == nullptr)
> +  if (tp == nullptr && tried)
>      return FALSE;
>  
> -  /* Making sure that crash's context is same */
> -  set_cpu(cpu, FALSE);
> -
> -  /* Switch to the thread */
> +  target_fetch_registers(get_thread_regcache(tp), -1);
>    switch_to_thread(tp);
> -
> -  /* Fetch/Refresh thread's registers */
> -  gdb_refresh_regcache(cpu);
> +  reinit_frame_cache ();
>  
>    return TRUE;
>  }
> diff --git a/defs.h b/defs.h
> index 49b606979d9e..d5cef621b465 100644
> --- a/defs.h
> +++ b/defs.h
> @@ -8192,7 +8192,6 @@ enum ppc64_regnum {
>  };
>  
>  /* crash_target.c */
> -extern int gdb_change_cpu_context (unsigned int cpu);
>  extern void gdb_refresh_regcache (unsigned int cpu);
>  
>  #endif /* !GDB_COMMON */
> diff --git a/kernel.c b/kernel.c
> index ea5b5cb32914..50832ed906e5 100644
> --- a/kernel.c
> +++ b/kernel.c
> @@ -6544,6 +6544,29 @@ set_cpu(int cpu, int print_context)
>  		show_context(CURRENT_CONTEXT());
>  }
>  
> +int
> +crash_set_thread(ulong task)
> +{
> +	bool found = FALSE;
> +	struct task_context *tc = FIRST_CONTEXT();
> +
> +	for (int i = 0; i < RUNNING_TASKS(); i++, tc++) {
> +		if (tc->task == task) {
> +			found = TRUE;
> +			break;
> +		}
> +	}
> +
> +	if (!found)
> +		return -1;
> +
> +	if (CURRENT_TASK() == tc->task)
> +		return tc->processor;
> +
> +	set_context(tc->task, NO_PID);
> +	return tc->processor;
> +}
> +
>  
>  /*
>   *  Collect the irq_desc[] entry along with its associated handler and
> diff --git a/task.c b/task.c
> index a405b05a47d1..ef79f533f11a 100644
> --- a/task.c
> +++ b/task.c
> @@ -715,7 +715,8 @@ task_init(void)
>  	 * crash_target::fetch_registers, so CPU 0's registers are shown as
>  	 * <unavailable> in gdb mode
>  	 * */
> -	gdb_refresh_regcache(0);
> +	for (int i = 0; i < get_cpus_online(); i++)
> +		gdb_refresh_regcache(i);
>  
>  	tt->flags |= TASK_INIT_DONE;
>  }
> @@ -5315,7 +5316,7 @@ set_context(ulong task, ulong pid, uint update_gdb_thread)
>  
>  		/* change the selected thread in gdb, according to current context */
>  		if (update_gdb_thread)
> -			return gdb_change_cpu_context(tc->processor);
> +			return gdb_change_thread_context(tc->task);
>  		else
>  			return TRUE;
>  	} else {
> 
--
Crash-utility mailing list -- devel@xxxxxxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxxxxxxxxxxxx
https://${domain_name}/admin/lists/devel.lists.crash-utility.osci.io/
Contribution Guidelines: https://github.com/crash-utility/crash/wiki