[Crash-utility] Re: [PATCH 2/5] Enable crash to change gdb thread context

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Tao,

On Thu, Mar 07, 2024 at 08:52:40PM +0800, Tao Liu wrote:
> Previously we can only view the stack unwinding for the tasks which are
> running on each CPUs. This patch will enable the ability to view
> arbitrary tasks stack unwinding.
> 
> After crash get initialized, "info threads" will output like the
> following:
> 
> crash> info threads
>   Id   Target Id         Frame
>   1    CPU 0             native_safe_halt () at arch/x86/include/asm/irqflags.h:54
> ...
> * 8    CPU 7             blk_mq_rq_timed_out (req=0xffff880fdb246000, reserved=reserved@entry=false) at block/blk-mq.c:640
> ...
>   13   CPU 12            <unavailable> in ?? ()
>   14   CPU 13            native_safe_halt () at arch/x86/include/asm/irqflags.h:54
> ...
> 
> crash> ps
>       PID    PPID  CPU       TASK        ST  %MEM      VSZ      RSS  COMM
> >       0       0   0  ffffffff819f9480  RU   0.0        0        0  [swapper/0]
> >       0       0   1  ffff880169411fa0  RU   0.0        0        0  [swapper/1]
> ...
> 	0       0  23  ffff8801694e0000  RU   0.0        0        0  [swapper/23]
> 	1       0  13  ffff880169b30000  IN   0.0   193052     4180  systemd
> 
> "info threads" show the tasks which are currently running on each CPU. If we'd
> like to view systemd task's stack unwinding, which is inactive status, we
> do the following:
> 
> crash> set 1
> or
> crash> set ffff880169b30000
> 
> Then the register cache of systemd will be swapped into CPU 13:
> 
> crash> info threads
>   Id   Target Id         Frame
>   1    CPU 0             native_safe_halt () at arch/x86/include/asm/irqflags.h:54
> ...
>   8    CPU 7             blk_mq_rq_timed_out (req=0xffff880fdb246000, reserved=reserved@entry=false) at block/blk-mq.c:640
> ...
>   13   CPU 12            <unavailable> in ?? ()
> * 14   CPU 13            0xffffffff816a8f65 in context_switch (rq=0x0, next=0x0, prev=0xffff880169b30000) at kernel/sched/core.c:2527
> ...
> 
> And we can view the stack unwinding of systemd:
> 
> crash> bt
> PID: 1        TASK: ffff880169b30000  CPU: 13   COMMAND: "systemd"
>  #0 [ffff880169b3bd58] __schedule at ffffffff816a8f65
>  #1 [ffff880169b3bdc0] schedule at ffffffff816a94e9
>  #2 [ffff880169b3bdd0] schedule_hrtimeout_range_clock at ffffffff816a86fd
>  #3 [ffff880169b3be68] schedule_hrtimeout_range at ffffffff816a8733
>  #4 [ffff880169b3be78] ep_poll at ffffffff8124bb7e
>  #5 [ffff880169b3bf30] sys_epoll_wait at ffffffff8124d00d
>  #6 [ffff880169b3bf80] system_call_fastpath at ffffffff816b5009
>     RIP: 00007f0449407923  RSP: 00007ffc35a3c378  RFLAGS: 00010246
>     RAX: 00000000000000e8  RBX: ffffffff816b5009  RCX: 0000000000000071
>     RDX: 000000000000001d  RSI: 00007ffc35a3d5a0  RDI: 0000000000000004
>     RBP: 00007ffc35a3d810   R8: 0000000000000000   R9: 0000000000000000
>     R10: 00000000ffffffff  R11: 0000000000000293  R12: 0000563ca2ebe980
>     R13: 0000000000000003  R14: ffffffffffffffff  R15: 0000000000000001
>     ORIG_RAX: 00000000000000e8  CS: 0033  SS: 002b
> crash> gdb bt
>  #0  0xffffffff816a8f65 in context_switch (rq=0x0, next=0x0, prev=0xffff880169b30000) at kernel/sched/core.c:2527
>  #1  __schedule () at kernel/sched/core.c:3540
>  #2  0xffffffff816a94e9 in schedule () at kernel/sched/core.c:3577
>  #3  0xffffffff816a86fd in schedule_hrtimeout_range_clock (expires=expires@entry=0x0, delta=delta@entry=0, mode=mode@entry=HRTIMER_MODE_ABS, clock=clock@entry=1) at kernel/hrtimer.c:1724
>  #4  0xffffffff816a8733 in schedule_hrtimeout_range (expires=expires@entry=0x0, delta=delta@entry=0, mode=mode@entry=HRTIMER_MODE_ABS) at kernel/hrtimer.c:1778
>  #5  0xffffffff8124bb7e in ep_poll (ep=0xffff880fd861f8c0, events=events@entry=0x7ffc35a3d5a0, maxevents=maxevents@entry=29, timeout=timeout@entry=-1) at fs/eventpoll.c:1669
>  #6  0xffffffff8124d00d in SYSC_epoll_wait (timeout=<optimized out>, maxevents=29, events=<optimized out>, epfd=<optimized out>) at fs/eventpoll.c:2043
>  #7  SyS_epoll_wait (epfd=<optimized out>, events=140721208415648, maxevents=29, timeout=4294967295) at fs/eventpoll.c:2008
>  #8  <signal handler called>
>  #9  0x00007f0449407923 in ?? ()
> 
> Signed-off-by: Tao Liu <ltao@xxxxxxxxxx>
> ---
>  crash_target.c | 50 ++++++++++++++++-------------
>  defs.h         |  3 +-
>  gdb-10.2.patch | 85 +-------------------------------------------------
>  kernel.c       | 22 +++++++++++++
>  task.c         |  5 +--
>  5 files changed, 57 insertions(+), 108 deletions(-)
> 
> diff --git a/crash_target.c b/crash_target.c
> index d06383f..8efe2d6 100644
> --- a/crash_target.c
> +++ b/crash_target.c
> @@ -29,10 +29,10 @@ extern "C" int gdb_readmem_callback(unsigned long, void *, int, int);
>  extern "C" int crash_get_nr_cpus(void);
>  extern "C" int crash_get_cpu_reg (int cpu, int regno, const char *regname,
>                                    int regsize, void *val);
> -extern "C" int gdb_change_cpu_context (unsigned int cpu);
>  extern "C" void gdb_refresh_regcache(unsigned int cpu);
>  extern "C" int set_cpu(int cpu, int print_context);
> -
> +extern "C" int crash_set_thread(ulong);
> +extern "C" int gdb_change_thread_context (ulong);
>  
>  /* The crash target.  */
>  
> @@ -110,11 +110,13 @@ crash_target::xfer_partial (enum target_object object, const char *annex,
>  
>  #define CRASH_INFERIOR_PID 1
>  
> +crash_target *target = NULL;
> +
>  void
>  crash_target_init (void)
>  {
>    int nr_cpus = crash_get_nr_cpus();
> -  crash_target *target = new crash_target ();
> +  target = new crash_target ();
>  
>    /* Own the target until it is successfully pushed.  */
>    target_ops_up target_holder (target);
> @@ -137,29 +139,35 @@ crash_target_init (void)
>    reinit_frame_cache ();
>  }
>  
> -/*
> - * Change gdb's thread context to the thread on given CPU
> - **/
>  extern "C" int
> -gdb_change_cpu_context(unsigned int cpu)
> +gdb_change_thread_context (ulong task)
>  {
> -  ptid_t ptid = ptid_t(CRASH_INFERIOR_PID, 0, cpu);
> -  inferior *inf = current_inferior ();
> -  thread_info *tp = find_thread_ptid (inf, ptid);
> -
> -  if (tp == nullptr)
> +  int tried = 0;
> +  inferior* inf = current_inferior();
> +  int cpu = crash_set_thread(task);
> +  if (cpu < 0)
>      return FALSE;
>  
> -  /* Making sure that crash's context is same */
> -  set_cpu(cpu, FALSE);
> -
> -  /* Switch to the thread */
> -  switch_to_thread(tp);
> -
> -  /* Fetch/Refresh thread's registers */
> -  gdb_refresh_regcache(cpu);
> +  ptid_t ptid = ptid_t(CRASH_INFERIOR_PID, 0, cpu);
>  
> -  return TRUE;
> +retry:
> +   thread_info *tp = find_thread_ptid (inf, ptid);
> +   if (tp == nullptr && !tried) {
> +     thread_info *thread = add_thread_silent(target,
> +				ptid_t(CRASH_INFERIOR_PID, 0, cpu));

A minor comment.
Is it possible to do it without needing 'target' as global ? I see
there's 'inf->current_top_target()', but it's return type if target_ops,
don't know if it can be used.

> +     tried++;
> +     if (thread) {
> +       goto retry;
> +     }
> +   }
> +
> +   if (tp == nullptr && tried)
> +     return FALSE;
> +
> +   target_fetch_registers(get_thread_regcache(tp), -1);
> +   switch_to_thread(tp);
> +   reinit_frame_cache ();
> +   return TRUE;
>  }
>  
>  /* Refresh regcache of gdb thread on given CPU
> diff --git a/defs.h b/defs.h
> index 06bd27d..1ddfc5e 100644
> --- a/defs.h
> +++ b/defs.h
> @@ -6083,6 +6083,7 @@ void sort_tgid_array(void);
>  int sort_by_tgid(const void *, const void *);
>  int in_irq_ctx(ulonglong, int, ulong);
>  void check_stack_overflow(void);
> +int gdb_change_thread_context (ulong);
>  
>  /*
>   *  extensions.c
> @@ -6123,6 +6124,7 @@ void dump_kernel_table(int);
>  void dump_bt_info(struct bt_info *, char *where);
>  void dump_log(int);
>  void parse_kernel_version(char *);
> +int crash_set_thread(ulong);
>  
>  #define LOG_LEVEL(v) ((v) & 0x07)
>  #define SHOW_LOG_LEVEL    (0x1)
> @@ -8191,7 +8193,6 @@ enum ppc64_regnum {
>  };
>  
>  /* crash_target.c */
> -extern int gdb_change_cpu_context (unsigned int cpu);
>  extern void gdb_refresh_regcache (unsigned int cpu);
>  
>  #endif /* !GDB_COMMON */
> diff --git a/gdb-10.2.patch b/gdb-10.2.patch
> index 138413f..3e6677d 100644
> --- a/gdb-10.2.patch
> +++ b/gdb-10.2.patch
> @@ -16058,35 +16058,6 @@ exit 0
>   m10200-dis.c
>   m10200-opc.c
>   m10300-dis.c
> ---- gdb-10.2/gdb/thread.c.orig
> -+++ gdb-10.2/gdb/thread.c
> -@@ -58,6 +58,11 @@ static int highest_thread_num;
> - /* The current/selected thread.  */
> - static thread_info *current_thread_;
> - 
> -+#ifdef CRASH_MERGE
> -+/* Function to set cpu, defined by crash-utility */
> -+extern "C" void set_cpu (int);
> -+#endif
> -+
> - /* RAII type used to increase / decrease the refcount of each thread
> -    in a given list of threads.  */
> - 
> -@@ -1896,7 +1901,13 @@ thread_command (const char *tidstr, int from_tty)
> -     {
> -       ptid_t previous_ptid = inferior_ptid;
> - 
> --      thread_select (tidstr, parse_thread_id (tidstr, NULL));
> -+      struct thread_info* thread_id = parse_thread_id (tidstr, NULL);
> -+
> -+#ifdef CRASH_MERGE
> -+      set_cpu(thread_id->ptid.tid());
> -+#endif
> -+
> -+      thread_select (tidstr, thread_id);
> - 
> -       /* Print if the thread has not changed, otherwise an event will
> - 	 be sent.  */
>  --- gdb-10.2/gdb/ui-file.h.orig
>  +++ gdb-10.2/gdb/ui-file.h
>  @@ -195,6 +195,7 @@ public:
> @@ -16139,58 +16110,4 @@ exit 0
>  +            (dynamic_cast<stdio_file *>gdb_stderr)->set_stream(original_stderr_stream);
>   }
>   
> - /*
> ---- gdb-10.2/gdb/thread.c.orig
> -+++ gdb-10.2/gdb/thread.c
> -@@ -60,7 +60,7 @@ static thread_info *current_thread_;
> - 
> - #ifdef CRASH_MERGE
> - /* Function to set cpu, defined by crash-utility */
> --extern "C" void set_cpu (int);
> -+extern "C" void set_cpu(int cpu, int print_context);
> - #endif
> - 
> - /* RAII type used to increase / decrease the refcount of each thread
> -@@ -1098,6 +1098,9 @@ print_thread_info_1 (struct ui_out *uiout, const char *requested_threads,
> - 	uiout->table_body ();
> -       }
> - 
> -+#ifdef CRASH_MERGE
> -+	int current_cpu = current_thread ? current_thread->ptid.tid() : 0;
> -+#endif
> -     for (inferior *inf : all_inferiors ())
> -       for (thread_info *tp : inf->threads ())
> - 	{
> -@@ -1113,6 +1116,11 @@ print_thread_info_1 (struct ui_out *uiout, const char *requested_threads,
> - 
> - 	  ui_out_emit_tuple tuple_emitter (uiout, NULL);
> - 
> -+	  /* Switch to the thread's CPU in crash */
> -+#ifdef CRASH_MERGE
> -+	  set_cpu(tp->ptid.tid(), FALSE);
> -+#endif
> -+
> - 	  if (!uiout->is_mi_like_p ())
> - 	    {
> - 	      if (tp == current_thread)
> -@@ -1185,6 +1193,11 @@ print_thread_info_1 (struct ui_out *uiout, const char *requested_threads,
> -     /* This end scope restores the current thread and the frame
> -        selected before the "info threads" command, and it finishes the
> -        ui-out list or table.  */
> -+
> -+#ifdef CRASH_MERGE
> -+	  /* Restore crash's context */
> -+	  set_cpu(current_cpu, FALSE);
> -+#endif
> -   }
> - 
> -   if (pid == -1 && requested_threads == NULL)
> -@@ -1904,7 +1917,7 @@ thread_command (const char *tidstr, int from_tty)
> -       struct thread_info* thread_id = parse_thread_id (tidstr, NULL);
> - 
> - #ifdef CRASH_MERGE
> --      set_cpu(thread_id->ptid.tid());
> -+      set_cpu(thread_id->ptid.tid(), FALSE);
> - #endif
> - 
> -       thread_select (tidstr, thread_id);
> + /*
> \ No newline at end of file
> diff --git a/kernel.c b/kernel.c
> index c7de73a..49d422b 100644
> --- a/kernel.c
> +++ b/kernel.c
> @@ -6543,6 +6543,28 @@ set_cpu(int cpu, int print_context)
>  		show_context(CURRENT_CONTEXT());
>  }
>  
> +int
> +crash_set_thread(ulong task)
> +{
> +	bool found = FALSE;
> +	struct task_context *tc = FIRST_CONTEXT();
> +
> +	for (int i = 0; i < RUNNING_TASKS(); i++, tc++) {
> +		if (tc->task == task) {
> +			found = TRUE;
> +			break;
> +		}
> +	}
> +
> +	if (!found)
> +		return -1;
> +
> +	if (CURRENT_TASK() == tc->task)
> +		return tc->processor;
> +
> +	set_context(tc->task, NO_PID, TRUE);
> +	return tc->processor;
> +}
>  
>  /*
>   *  Collect the irq_desc[] entry along with its associated handler and
> diff --git a/task.c b/task.c
> index a405b05..ef79f53 100644
> --- a/task.c
> +++ b/task.c
> @@ -715,7 +715,8 @@ task_init(void)
>  	 * crash_target::fetch_registers, so CPU 0's registers are shown as
>  	 * <unavailable> in gdb mode
>  	 * */
> -	gdb_refresh_regcache(0);
> +	for (int i = 0; i < get_cpus_online(); i++)
> +		gdb_refresh_regcache(i);

Does x86 require all regcache(s) to be refreshed at init time ?

Till v4 of my ppc series, I also was doing refresh of all CPUs, but it's
better to do it when required, and there was a review in v4, patch 5
from Lianbo, quoting it here:

> In addition, I haven't tested how far it takes the time in the 
> multi-core environment, is it possible to implement a smaller 
> granularity refresh operation? For example: When we need switch to 
> another cpu, execute the refresh operation for corresponding regcache. 

CPU 0's refresh is required due to gdb having already tried to
initialised registers before we initialised crash_target

This might add a small noticeable delay on vmcores from big systems.

Adds around 4.6 seconds for a ppc64 vmcore with 64 CPUs, in crash init
time, while reading it on a fast 48 cpu Power8 system.
This overhead is less for systems with less CPUs, eg. ~0.43 seconds, on
a vmcore of 8 CPUs x86_64 system.

Thanks,
Aditya Gupta

>  
>  	tt->flags |= TASK_INIT_DONE;
>  }
> @@ -5315,7 +5316,7 @@ set_context(ulong task, ulong pid, uint update_gdb_thread)
>  
>  		/* change the selected thread in gdb, according to current context */
>  		if (update_gdb_thread)
> -			return gdb_change_cpu_context(tc->processor);
> +			return gdb_change_thread_context(tc->task);
>  		else
>  			return TRUE;
>  	} else {
> -- 
> 2.40.1
> 
--
Crash-utility mailing list -- devel@xxxxxxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxxxxxxxxxxxx
https://${domain_name}/admin/lists/devel.lists.crash-utility.osci.io/
Contribution Guidelines: https://github.com/crash-utility/crash/wiki




[Index of Archives]     [Fedora Development]     [Fedora Desktop]     [Fedora SELinux]     [Yosemite News]     [KDE Users]     [Fedora Tools]

 

Powered by Linux