[Crash-utility] Re: [PATCH 2/5] Enable crash to change gdb thread context

Aditya Gupta <adityag@xxxxxxxxxxxxx> · Wed, 13 Mar 2024 15:03:06 +0530

Hi Tao,
On Wed, Mar 13, 2024 at 11:23:56AM +0800, Tao Liu wrote:
> Hi Aditya,
> 
> On Tue, Mar 12, 2024 at 5:12 PM Aditya Gupta <adityag@xxxxxxxxxxxxx> wrote:
> >
> > Hi Tao,
> >
> > > <...>
> > >
> > > +crash_target *target = NULL;
> > > +
> > >  void
> > >  crash_target_init (void)
> > >  {
> > >    int nr_cpus = crash_get_nr_cpus();
> > > -  crash_target *target = new crash_target ();
> > > +  target = new crash_target ();
> > >
> > >    /* Own the target until it is successfully pushed.  */
> > >    target_ops_up target_holder (target);
> > > @@ -137,29 +139,35 @@ crash_target_init (void)
> > >    reinit_frame_cache ();
> > >  }
> > >
> > > -/*
> > > - * Change gdb's thread context to the thread on given CPU
> > > - **/
> > >  extern "C" int
> > > -gdb_change_cpu_context(unsigned int cpu)
> > > +gdb_change_thread_context (ulong task)
> > >  {
> > > -  ptid_t ptid = ptid_t(CRASH_INFERIOR_PID, 0, cpu);
> > > -  inferior *inf = current_inferior ();
> > > -  thread_info *tp = find_thread_ptid (inf, ptid);
> > > -
> > > -  if (tp == nullptr)
> > > +  int tried = 0;
> > > +  inferior* inf = current_inferior();
> > > +  int cpu = crash_set_thread(task);
> > > +  if (cpu < 0)
> > >      return FALSE;
> > >
> > > -  /* Making sure that crash's context is same */
> > > -  set_cpu(cpu, FALSE);
> > > -
> > > -  /* Switch to the thread */
> > > -  switch_to_thread(tp);
> > > -
> > > -  /* Fetch/Refresh thread's registers */
> > > -  gdb_refresh_regcache(cpu);
> > > +  ptid_t ptid = ptid_t(CRASH_INFERIOR_PID, 0, cpu);
> > >
> > > -  return TRUE;
> > > +retry:
> > > +   thread_info *tp = find_thread_ptid (inf, ptid);
> > > +   if (tp == nullptr && !tried) {
> > > +     thread_info *thread = add_thread_silent(target,
> > > +                             ptid_t(CRASH_INFERIOR_PID, 0, cpu));
> >
> > A minor comment.
> > Is it possible to do it without needing 'target' as global ? I see
> > there's 'inf->current_top_target()', but it's return type if target_ops,
> > don't know if it can be used.
> >
> Thanks for your comments, however I guess it is not workable for the
> code change. What we want is to add a thread to the target. I found
> one function "target.c:new_thread", which can be used as
> "new_thread(current_inferior(), ptid_t(CRASH_INFERIOR_PID, 0, cpu));"
> to replace the "add_thread_silent()" code, except we need to make
> "new_thread" to be non-static.
> 
> I prefer to keep the code as current, because I don't see the side
> effect of making target as global variable. What do you think?

Then there doesn't seem to be an alternative, then I guess we will have
to keep the global variable, seems okay to me.

> 
> > > <...>
> > >
> > >  /*
> > >   *  Collect the irq_desc[] entry along with its associated handler and
> > > diff --git a/task.c b/task.c
> > > index a405b05..ef79f53 100644
> > > --- a/task.c
> > > +++ b/task.c
> > > @@ -715,7 +715,8 @@ task_init(void)
> > >        * crash_target::fetch_registers, so CPU 0's registers are shown as
> > >        * <unavailable> in gdb mode
> > >        * */
> > > -     gdb_refresh_regcache(0);
> > > +     for (int i = 0; i < get_cpus_online(); i++)
> > > +             gdb_refresh_regcache(i);
> >
> > Does x86 require all regcache(s) to be refreshed at init time ?
> >
> > Till v4 of my ppc series, I also was doing refresh of all CPUs, but it's
> > better to do it when required, and there was a review in v4, patch 5
> > from Lianbo, quoting it here:
> >
> > > In addition, I haven't tested how far it takes the time in the
> > > multi-core environment, is it possible to implement a smaller
> > > granularity refresh operation? For example: When we need switch to
> > > another cpu, execute the refresh operation for corresponding regcache.
> >
> > CPU 0's refresh is required due to gdb having already tried to
> > initialised registers before we initialised crash_target
> >
> > This might add a small noticeable delay on vmcores from big systems.
> >
> > Adds around 4.6 seconds for a ppc64 vmcore with 64 CPUs, in crash init
> > time, while reading it on a fast 48 cpu Power8 system.
> > This overhead is less for systems with less CPUs, eg. ~0.43 seconds, on
> > a vmcore of 8 CPUs x86_64 system.
> 
> Yes, thanks for the testing results. There do have performance
> downgrades of refreshing all cpu caches. But this will give correct
> results when user type "info threads" after showing "crash>":
> 
> With all cpu cache refreshed:
> crash> info threads
>   Id   Target Id         Frame
>   1    CPU 0             native_safe_halt () at
> arch/x86/include/asm/irqflags.h:54
>   2    CPU 1             native_safe_halt () at
> arch/x86/include/asm/irqflags.h:54
>   3    CPU 2             <unavailable> in ?? ()
>   4    CPU 3             <unavailable> in ?? ()
>   5    CPU 4             native_safe_halt () at
> arch/x86/include/asm/irqflags.h:54
>   6    CPU 5             <unavailable> in ?? ()
>   7    CPU 6             <unavailable> in ?? ()
> * 8    CPU 7             blk_mq_rq_timed_out (req=0xffff880fdb246000,
> reserved=reserved@entry=false) at block/blk-mq.c:640
>   9    CPU 8             native_safe_halt () at
> arch/x86/include/asm/irqflags.h:54
>   10   CPU 9             <unavailable> in ?? ()
>   11   CPU 10            <unavailable> in ?? ()
>   12   CPU 11            native_safe_halt () at
> arch/x86/include/asm/irqflags.h:54
>   13   CPU 12            <unavailable> in ?? ()
>   14   CPU 13            native_safe_halt () at
> arch/x86/include/asm/irqflags.h:54
>   15   CPU 14            <unavailable> in ?? ()
>   16   CPU 15            <unavailable> in ?? ()
>   17   CPU 16            <unavailable> in ?? ()
>   18   CPU 17            native_safe_halt () at
> arch/x86/include/asm/irqflags.h:54
>   19   CPU 18            native_safe_halt () at
> arch/x86/include/asm/irqflags.h:54
>   20   CPU 19            <unavailable> in ?? ()
>   21   CPU 20            <unavailable> in ?? ()
>   22   CPU 21            <unavailable> in ?? ()
>   23   CPU 22            native_safe_halt () at
> arch/x86/include/asm/irqflags.h:54
>   24   CPU 23            <unavailable> in ?? ()
> crash> q
> 
> With only cpu0 refreshed:
> crash> info threads
>   Id   Target Id         Frame
>   1    CPU 0             native_safe_halt () at
> arch/x86/include/asm/irqflags.h:54
>   2    CPU 1             blk_mq_rq_timed_out (req=0xffff880fdb246000,
> reserved=reserved@entry=false) at block/blk-mq.c:640
>   3    CPU 2             blk_mq_rq_timed_out (req=0xffff880fdb246000,
> reserved=reserved@entry=false) at block/blk-mq.c:640
>   4    CPU 3             blk_mq_rq_timed_out (req=0xffff880fdb246000,
> reserved=reserved@entry=false) at block/blk-mq.c:640
>   5    CPU 4             blk_mq_rq_timed_out (req=0xffff880fdb246000,
> reserved=reserved@entry=false) at block/blk-mq.c:640
>   6    CPU 5             blk_mq_rq_timed_out (req=0xffff880fdb246000,
> reserved=reserved@entry=false) at block/blk-mq.c:640
>   7    CPU 6             blk_mq_rq_timed_out (req=0xffff880fdb246000,
> reserved=reserved@entry=false) at block/blk-mq.c:640
> * 8    CPU 7             blk_mq_rq_timed_out (req=0xffff880fdb246000,
> reserved=reserved@entry=false) at block/blk-mq.c:640
>   9    CPU 8             blk_mq_rq_timed_out (req=0xffff880fdb246000,
> reserved=reserved@entry=false) at block/blk-mq.c:640
>   10   CPU 9             blk_mq_rq_timed_out (req=0xffff880fdb246000,
> reserved=reserved@entry=false) at block/blk-mq.c:640
>   11   CPU 10            blk_mq_rq_timed_out (req=0xffff880fdb246000,
> reserved=reserved@entry=false) at block/blk-mq.c:640
>   12   CPU 11            blk_mq_rq_timed_out (req=0xffff880fdb246000,
> reserved=reserved@entry=false) at block/blk-mq.c:640
>   13   CPU 12            blk_mq_rq_timed_out (req=0xffff880fdb246000,
> reserved=reserved@entry=false) at block/blk-mq.c:640
>   14   CPU 13            blk_mq_rq_timed_out (req=0xffff880fdb246000,
> reserved=reserved@entry=false) at block/blk-mq.c:640
>   15   CPU 14            blk_mq_rq_timed_out (req=0xffff880fdb246000,
> reserved=reserved@entry=false) at block/blk-mq.c:640
>   16   CPU 15            blk_mq_rq_timed_out (req=0xffff880fdb246000,
> reserved=reserved@entry=false) at block/blk-mq.c:640
>   17   CPU 16            blk_mq_rq_timed_out (req=0xffff880fdb246000,
> reserved=reserved@entry=false) at block/blk-mq.c:640
>   18   CPU 17            blk_mq_rq_timed_out (req=0xffff880fdb246000,
> reserved=reserved@entry=false) at block/blk-mq.c:640
>   19   CPU 18            blk_mq_rq_timed_out (req=0xffff880fdb246000,
> reserved=reserved@entry=false) at block/blk-mq.c:640
>   20   CPU 19            blk_mq_rq_timed_out (req=0xffff880fdb246000,
> reserved=reserved@entry=false) at block/blk-mq.c:640
>   21   CPU 20            blk_mq_rq_timed_out (req=0xffff880fdb246000,
> reserved=reserved@entry=false) at block/blk-mq.c:640
>   22   CPU 21            blk_mq_rq_timed_out (req=0xffff880fdb246000,
> reserved=reserved@entry=false) at block/blk-mq.c:640
>   23   CPU 22            blk_mq_rq_timed_out (req=0xffff880fdb246000,
> reserved=reserved@entry=false) at block/blk-mq.c:640
>   24   CPU 23            blk_mq_rq_timed_out (req=0xffff880fdb246000,
> reserved=reserved@entry=false) at block/blk-mq.c:640
> 
> As we can see, other cpus[1-6, 8-23] just take the reg cache of
> cpu[7], which is incorrect. And if users go further like, "thread 20"
> and "gdb bt", it will also give incorrect stack traces.
> 
> The cpu cache will only get refreshed once user type "set <pid>", so
> the cpu cache will be refreshed by the <pid> task's context.
> 
> I doubt a user will understand all the details and constraints, so I'm
> afraid the user will be confused by the faulty output. But I also have
> no objection if the performance is the priority. Basically it is a
> balance of pays and gains. In addition, since cmd "info" and "thread"
> is a command provided by gdb, currently I don't know how to hack
> those, so cpu cache can be refreshed when "info threads" or "thread
> <num>" have been invoked.
> 
> Do you have any thoughts?

I also had faced that issue initially, ie. the other CPUs using up same
regcache, if all are not refreshed.
While iterating through all threads, gdb switches it's context
temporarily, while crash's context remained same, thus causing gdb to
get same registers for all threads other than 0.

This was solved in patch #3 (synchronise cpu context changes between crash/gdb)
in the ppc's 'Improve stack unwind on ppc64' series, by syncing gdb's
context with crash.

Can this change in thread.c in gdb-10.2.patch in patch #2 be reverted ?
That will fix it.

```
---- gdb-10.2/gdb/thread.c.orig
-+++ gdb-10.2/gdb/thread.c
-@@ -60,7 +60,7 @@ static thread_info *current_thread_;
- 
- #ifdef CRASH_MERGE
- /* Function to set cpu, defined by crash-utility */
--extern "C" void set_cpu (int);
-+extern "C" void set_cpu(int cpu, int print_context);
- #endif
- 
- /* RAII type used to increase / decrease the refcount of each thread
-@@ -1098,6 +1098,9 @@ print_thread_info_1 (struct ui_out *uiout, const char *requested_threads,
- 	uiout->table_body ();
-       }
- 
-+#ifdef CRASH_MERGE
-+	int current_cpu = current_thread ? current_thread->ptid.tid() : 0;
-+#endif
-     for (inferior *inf : all_inferiors ())
-       for (thread_info *tp : inf->threads ())
- 	{
-@@ -1113,6 +1116,11 @@ print_thread_info_1 (struct ui_out *uiout, const char *requested_threads,
- 
- 	  ui_out_emit_tuple tuple_emitter (uiout, NULL);
- 
-+	  /* Switch to the thread's CPU in crash */
-+#ifdef CRASH_MERGE
-+	  set_cpu(tp->ptid.tid(), FALSE);
-+#endif
-+
```

'info threads' keeps changing the gdb context temporarily, while it
prints each thread's frame and other information.
Correspondingly this changes ensured that crash's context also syncs
according to change in gdb's context.

Then, when user does 'info threads', it will refresh all regcache since
it has to go through each thread.

In that case that issue of the same frame being printed for all threads
should not happen.

That should remove the need to refresh all regcache at init time, and
leave it to later when user needs it. Refreshing all regcache at init
time is the sure shot way that all regcache are correct, but I guess
everyone will anyways do a 'info threads', but we can delay the overhead
to later. What do you think ?

Thanks,
Aditya Gupta

> 
> Thanks,
> Tao Liu
> 
> 
> 
> >
> > Thanks,
> > Aditya Gupta
> >
> > >
> > >       tt->flags |= TASK_INIT_DONE;
> > >  }
> > > @@ -5315,7 +5316,7 @@ set_context(ulong task, ulong pid, uint update_gdb_thread)
> > >
> > >               /* change the selected thread in gdb, according to current context */
> > >               if (update_gdb_thread)
> > > -                     return gdb_change_cpu_context(tc->processor);
> > > +                     return gdb_change_thread_context(tc->task);
> > >               else
> > >                       return TRUE;
> > >       } else {
> > > --
> > > 2.40.1
> > >
> >
> 
--
Crash-utility mailing list -- devel@xxxxxxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxxxxxxxxxxxx
https://${domain_name}/admin/lists/devel.lists.crash-utility.osci.io/
Contribution Guidelines: https://github.com/crash-utility/crash/wiki