[Crash-utility] Re: [PATCH v4 5/5] fix 'info threads' command

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello Lianbo,
Thanks for the reviews.

On Mon, Dec 18, 2023 at 09:00:54PM +0800, Lianbo Jiang wrote:
> On 12/15/23 15:59, Aditya Gupta wrote:
> 
> > Currently the output is like this ppc64le:
> > 
> > crash> info threads
> >    Id   Target Id         Frame
> >    1    CPU 0             <unavailable> in ?? ()
> >    2    CPU 1             0xc000000000025d00 in ppc_panic_event (this=<optimized out>, event=<optimized out>, ptr=0x0) at arch/powerpc/kernel/setup-common.c:712
> >    3    CPU 2             0xc000000000025d00 in ppc_panic_event (this=<optimized out>, event=<optimized out>, ptr=0x0) at arch/powerpc/kernel/setup-common.c:712
> >    4    CPU 3             0xc000000000025d00 in ppc_panic_event (this=<optimized out>, event=<optimized out>, ptr=0x0) at arch/powerpc/kernel/setup-common.c:712
> > * 5    CPU 4             0xc000000000025d00 in ppc_panic_event (this=<optimized out>, event=<optimized out>, ptr=0x0) at arch/powerpc/kernel/setup-common.c:712
> >    6    CPU 5             0xc000000000025d00 in ppc_panic_event (this=<optimized out>, event=<optimized out>, ptr=0x0) at arch/powerpc/kernel/setup-common.c:712
> >    7    CPU 6             0xc000000000025d00 in ppc_panic_event (this=<optimized out>, event=<optimized out>, ptr=0x0) at arch/powerpc/kernel/setup-common.c:712
> >    8    CPU 7             0xc000000000025d00 in ppc_panic_event (this=<optimized out>, event=<optimized out>, ptr=0x0) at arch/powerpc/kernel/setup-common.c:712
> > 
> > There are two issues here:
> > 
> > 1. CPU 0 always shows '<unavailable>' frame:
> > 
> >     This is due to 'thread 0' being initialised in 'crash_target_init',
> >     but it's very early in crash init, so it could not get any registers
> >     from the vmcore.
> > 
> >     And, since GDB caches registers (whether it is there or now), it
> >     keeps thinking that registers for this thread are unavailable, so all
> >     future usage of thread 0 also shows unavailable registers & frame
> > 
> >     Fix this by refreshing the register cache of all threads in
> >     'gdb_refresh_regcache', at task_init
> > 
> > 2. All other CPUs show the same frame:
> > 
> >     For each thread, GDB depends on crash to give it registers, ie.
> >     crash_target::fetch_registers, but this in turn depends on crash's
> >     current context
> > 
> >     GDB internally switches it's thread context internally to each thread
> >     one by one, while printing the thread's frame, similarly switch
> >     crash's context when the thread context is changed, so that it
> >     matches the thread context in gdb (such as, CPU 3 in crash for thread
> >     2(CPU 3) in gdb).
> > 
> >     Also, since we are switching context multiple times, add an argument
> >     to 'set_cpu' to not print the context everytime
> > 
> > With these fixed, it will show all threads, as well as correct registers
> > and frame. It might still show similar frame for many threads, due to
> > all of them being on same NMI exception crash path, but if checked with
> > 'info registers', they will have different registers.
> > 
> > Signed-off-by: Aditya Gupta <adityag@xxxxxxxxxxxxx>
> > ---
> >   crash_target.c | 36 +++++++++++++++++++++++++++++++-----
> >   defs.h         |  3 ++-
> >   gdb-10.2.patch | 40 +++++++++++++++++++++++++++++++++++++---
> >   kernel.c       |  6 ++++--
> >   task.c         |  9 +++++++++
> >   tools.c        |  2 +-
> >   6 files changed, 84 insertions(+), 12 deletions(-)
> > 
> > diff --git a/crash_target.c b/crash_target.c
> > index 223a51acb1e1..96a65a1fa55c 100644
> > --- a/crash_target.c
> > +++ b/crash_target.c
> > @@ -30,7 +30,8 @@ extern "C" int crash_get_nr_cpus(void);
> >   extern "C" int crash_get_cpu_reg (int cpu, int regno, const char *regname,
> >                                     int regsize, void *val);
> >   extern "C" int gdb_change_cpu_context (unsigned int cpu);
> > -extern "C" int set_cpu (int cpu);
> > +extern "C" void gdb_refresh_regcache(void);
> > +extern "C" int set_cpu(int cpu, int print_context);
> >   /* The crash target.  */
> > @@ -139,19 +140,44 @@ crash_target_init (void)
> >   /*
> >    * Change gdb's thread context to the thread on given CPU
> >    * */
> > -extern "C" int gdb_change_cpu_context (unsigned int cpu) {
> > +extern "C" int
> > +gdb_change_cpu_context(unsigned int cpu)
> > +{
> >     ptid_t ptid = ptid_t(CRASH_INFERIOR_PID, 0, cpu);
> > -  inferior* inf = current_inferior ();
> > +  inferior *inf = current_inferior ();
> >     thread_info *tp = find_thread_ptid (inf, ptid);
> >     if (tp == nullptr)
> > -	  return FALSE;
> > +    return FALSE;
> >     /* Making sure that crash's context is same */
> > -  set_cpu(cpu);
> > +  set_cpu(cpu, FALSE);
> >     /* Switch to the thread */
> >     switch_to_thread(tp);
> >     return TRUE;
> >   }
> > +/* Refresh regcache of all gdb threads
> > + *
> > + * When gdb threads were initially added by 'crash_target_init', crash was not
> > + * yet initialised, and hence crash_target::fetch_registers didn't give any
> > + * registers to gdb.
> > + *
> > + * This is meant to be called after tasks in crash have been initialised, and
> > + * possible machdep->get_cpu_reg is also set so architecture can give registers
> > + */
> > +extern "C" void
> > +gdb_refresh_regcache(void)
> > +{
> > +  inferior *inf = current_inferior();
> > +  int saved_cpu = inferior_thread()->ptid.tid();
> > +
> > +  for (thread_info * tp : inf->threads()) {
> > +    /* temporarily switch to the cpu so we get correct registers */
> > +    set_cpu(tp->ptid.tid(), FALSE);
> > +    target_fetch_registers(get_thread_regcache(tp), -1);
> > +  }
> > +
> > +  set_cpu(saved_cpu, FALSE);
> > +}
> 
> The 'saved_cpu' will overwrite the value of set_cpu(tp->ptid.tid(), FALSE)
> in the for-loop, is it expected behavior?

Yes that is expected behavior, the idea is to save the current cpu, then change
the cpu temporarily for each thread, then restore the context to 'saved_cpu'.
Didn't want to change the context before entring and after exiting this
function.

> 
> In addition, I haven't tested how far it takes the time in the multi-core
> environment, is it possible to implement a smaller granularity refresh
> operation? For example: When we need switch to another cpu, execute the
> refresh operation for corresponding regcache.
> 
> Just my thoughts, I did not try it, not sure if this is doable.

True, a smaller granularity should be doable, but might not be a small diff.
Yes, it can take time on a bigger system. Ideally set_cpu in crash should NOT be
very costly, as all tasks have already been cached by crash for all cpus in
tt.context_array, though fetching the registers by gdb should be the more costly
operation.

With current code, it causes all registers to be loaded for all CPUs, so yes
increasing the burden in initialisation rather than lazy loading later.

A smaller impact change might be to refresh registers during
'gdb_change_cpu_context', 'select_thread', and 'task_init' (to refresh for CPU
0, which at init time is unavailable).

Will try that change in next version.

Thanks for the idea

> 
> Thanks.
> 
> Lianbo
> 
> > diff --git a/defs.h b/defs.h
> > index 5e6df14e947d..240e5c27b223 100644
> > --- a/defs.h
> > +++ b/defs.h
> > @@ -6044,7 +6044,7 @@ void parse_kernel_version(char *);
> >   #define SHOW_LOG_AUDIT (0x8)
> >   #define SHOW_LOG_CTIME (0x10)
> >   #define SHOW_LOG_SAFE  (0x20)
> > -void set_cpu(int);
> > +void set_cpu(int cpu, int print_context);
> >   void clear_machdep_cache(void);
> >   struct stack_hook *gather_text_list(struct bt_info *);
> >   int get_cpus_online(void);
> > @@ -7980,5 +7980,6 @@ enum ppc64_renum {
> >   /* crash_target.c */
> >   extern int gdb_change_cpu_context (unsigned int cpu);
> > +extern void gdb_refresh_regcache (void);
> >   #endif /* !GDB_COMMON */
> > diff --git a/gdb-10.2.patch b/gdb-10.2.patch
> > index a4e45fa0e6fa..9051c83ac784 100644
> > --- a/gdb-10.2.patch
> > +++ b/gdb-10.2.patch
> > @@ -494,13 +494,47 @@ exit 0
> >   +#ifdef CRASH_MERGE
> >   +/* Function to set cpu, defined by crash-utility */
> > -+extern "C" void set_cpu (int);
> > ++extern "C" void set_cpu(int cpu, int print_context);
> >   +#endif
> >   +
> >    /* RAII type used to increase / decrease the refcount of each thread
> >       in a given list of threads.  */
> > -@@ -1896,7 +1901,13 @@ thread_command (const char *tidstr, int from_tty)
> > +@@ -1093,6 +1098,9 @@ print_thread_info_1 (struct ui_out *uiout, const char *requested_threads,
> > + 	uiout->table_body ();
> > +       }
> > +
> > ++#ifdef CRASH_MERGE
> > ++	int current_cpu = current_thread ? current_thread->ptid.tid() : 0;
> > ++#endif
> > +     for (inferior *inf : all_inferiors ())
> > +       for (thread_info *tp : inf->threads ())
> > + 	{
> > +@@ -1108,6 +1116,11 @@ print_thread_info_1 (struct ui_out *uiout, const char *requested_threads,
> > +
> > + 	  ui_out_emit_tuple tuple_emitter (uiout, NULL);
> > +
> > ++	  /* Switch to the thread's CPU in crash */
> > ++#ifdef CRASH_MERGE
> > ++	  set_cpu(tp->ptid.tid(), FALSE);
> > ++#endif
> > ++
> > + 	  if (!uiout->is_mi_like_p ())
> > + 	    {
> > + 	      if (tp == current_thread)
> > +@@ -1180,6 +1193,11 @@ print_thread_info_1 (struct ui_out *uiout, const char *requested_threads,
> > +     /* This end scope restores the current thread and the frame
> > +        selected before the "info threads" command, and it finishes the
> > +        ui-out list or table.  */
> > ++
> > ++#ifdef CRASH_MERGE
> > ++	  /* Restore crash's context */
> > ++	  set_cpu(current_cpu, FALSE);
> > ++#endif
> > +   }
> > +
> > +   if (pid == -1 && requested_threads == NULL)
> > +@@ -1896,7 +1914,13 @@ thread_command (const char *tidstr, int from_tty)
> >        {
> >          ptid_t previous_ptid = inferior_ptid;
> > @@ -508,7 +542,7 @@ exit 0
> >   +      struct thread_info* thread_id = parse_thread_id (tidstr, NULL);
> >   +
> >   +#ifdef CRASH_MERGE
> > -+      set_cpu(thread_id->ptid.tid());
> > ++      set_cpu(thread_id->ptid.tid(), FALSE);
> >   +#endif
> >   +
> >   +      thread_select (tidstr, thread_id);
> > diff --git a/kernel.c b/kernel.c
> > index a8d60507dd95..edb728aab06d 100644
> > --- a/kernel.c
> > +++ b/kernel.c
> > @@ -6491,9 +6491,10 @@ no_cpu_flags:
> >   /*
> >    *  Set the context to the active task on a given cpu -- dumpfiles only.
> > + *  Optionally show the context if print_context is TRUE
> >    */
> >   void
> > -set_cpu(int cpu)
> > +set_cpu(int cpu, int print_context)
> >   {
> >   	ulong task;
> > @@ -6515,7 +6516,8 @@ set_cpu(int cpu)
> >   	else
> >   		error(FATAL, "cannot determine active task on cpu %ld\n", cpu);
> > -	show_context(CURRENT_CONTEXT());
> > +	if (print_context)
> > +		show_context(CURRENT_CONTEXT());
> >   }
> > diff --git a/task.c b/task.c
> > index 64088eca29a1..19894ae3a266 100644
> > --- a/task.c
> > +++ b/task.c
> > @@ -706,6 +706,15 @@ task_init(void)
> >   		kt->boot_date.tv_nsec = 0;
> >   	}
> > +	/*
> > +	 * Refresh gdb thread's regcache
> > +	 *
> > +	 * This is required since, threads were initialised in crash_target_init
> > +	 * when crash was not initialised yet and hence could not pass
> > +	 * registers to gdb when gdb requests via crash_target::fetch_registers.
> > +	 * */
> > +	gdb_refresh_regcache();
> > +
> >   	tt->flags |= TASK_INIT_DONE;
> >   }
> > diff --git a/tools.c b/tools.c
> > index 0f2db108838a..40a78cb392cc 100644
> > --- a/tools.c
> > +++ b/tools.c
> > @@ -1860,7 +1860,7 @@ cmd_set(void)
> >   				break;
> >   			}
> >   			cpu = dtoi(optarg, FAULT_ON_ERROR, NULL);
> > -			set_cpu(cpu);
> > +			set_cpu(cpu, TRUE);
> >   			return;
> >   		case 'p':
> 
--
Crash-utility mailing list -- devel@xxxxxxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxxxxxxxxxxxx
https://${domain_name}/admin/lists/devel.lists.crash-utility.osci.io/
Contribution Guidelines: https://github.com/crash-utility/crash/wiki




[Index of Archives]     [Fedora Development]     [Fedora Desktop]     [Fedora SELinux]     [Yosemite News]     [KDE Users]     [Fedora Tools]

 

Powered by Linux