Re: patch for slight modification to runq -g command

"Chen, Anthony" <Anthony.Chen@xxxxxxxxxxxx> · Thu, 7 Nov 2013 01:40:25 +0000

Hi Dave,

I have cleaned up the code and added another change. The current
running task is not in the rb tree (rb_root), so run -q displays it like:

  CURRENT: PID: 9048   TASK: ffff8808b07e4200  COMMAND: "actmain"
  TASK_GROUP RT_RQ: ffff880002493820
  RT PRIO_ARRAY: ffff880002493820
     [no tasks queued]
  TASK_GROUP CFS_RQ: ffff8800024936e0
  CFS RB_ROOT: ffff880002493710
     GROUP CFS RB_ROOT: ffff882d609ce830 <TDAT>
        GROUP CFS RB_ROOT: ffff883f0bcbfa30 <User>
               [no tasks queued]

I can understand why the current running task is not displayed.
However, the "-g" option displays all the task_groups the task
belongs to but at the end it shows "[no tasks queued]". That is
just strange.  The new change is to display the task that is running like:

  CURRENT: PID: 9048  CFS: ffff88039351a800 TASK: ffff8808b07e4200  COMMAND: "actmain"
  TASK_GROUP RT_RQ: ffff880002493820
  RT PRIO_ARRAY: ffff880002493820
     [no tasks queued]
  TASK_GROUP CFS_RQ: ffff8800024936e0
  CFS RB_ROOT: ffff880002493710
     GROUP: ffff884052bc9800 CFS_RQ: ffff882d609ce800 RB_ROOT: ffff882d609ce830 <TDAT> nr_running: 1 h_nr_running: 1 
        GROUP: ffff884058f1d000 CFS_RQ: ffff883f0bcbfa00 RB_ROOT: ffff883f0bcbfa30 <User> nr_running: 1 h_nr_running: 1 
              [120] PID: 9048   TASK: ffff8808b07e4200  COMMAND: "actmain"

Thanks,
Anthony

    > -----Original Message-----
    > From: crash-utility-bounces@xxxxxxxxxx [mailto:crash-utility-
    > bounces@xxxxxxxxxx] On Behalf Of Dave Anderson
    > Sent: Thursday, October 17, 2013 12:46 PM
    > To: Discussion list for crash utility usage, maintenance and development
    > Subject: Re: [Crash-utility] FW: patch for slight modification to runq -g
    > command
    > 
    > 
    > 
    > ----- Original Message -----
    > >
    > >
    > >
    > >
    > > Hi,
    > >
    > >
    > >
    > > We’re debugging an in-house application that makes use of hard limit
    > > extensively. We ran into a lot of timing windows (all our own making)
    > and we
    > > use runq -g command a lot. The current runq -g display only the
    > RBROOT
    > > pointer. It really is a bit inconvenient to traverse the task_group
    > > hierarchy. It would be nice to have the command also display the
    > > corresponding task_group, cfs_rq pointer (at a minimum). Since the
    > way we
    > > crash the system by messing up the nr_running and h_nr_running, so
    > we also
    > > display those two fields at the same time. Here’s an example of before
    > and
    > > after.
    > >
    > >
    > >
    > > CPU 4
    > >
    > > CURRENT: PID: 0 TASK: ffff8840668c0380 COMMAND: "swapper"
    > >
    > > TASK_GROUP RT_RQ: ffff8800027d3820
    > >
    > > RT PRIO_ARRAY: ffff8800027d3820
    > >
    > > [no tasks queued]
    > >
    > > TASK_GROUP CFS_RQ: ffff8800027d36e0
    > >
    > > CFS RB_ROOT: ffff8800027d3710
    > >
    > > GROUP CFS RB_ROOT: ffff883ff69bcc30 <TDAT>
    > >
    > > GROUP CFS RB_ROOT: ffff884006290e30 <User>
    > >
    > > GROUP CFS RB_ROOT: ffff88400641c430 <TDWMVP1>
    > >
    > > GROUP CFS RB_ROOT: ffff88400646b030 <ServDown1:0>
    > >
    > > GROUP CFS RB_ROOT: ffff884006492630 <ServOrder1:1>
    > >
    > > GROUP CFS RB_ROOT: ffff883ff058fe30 <TDWMWD57> (THROTTLED)
    > >
    > > GROUP CFS RB_ROOT: ffff88047889ee30 <S:WD:3d:35fa>
    > >
    > > [120] PID: 27655 TASK: ffff8805078ce2c0 COMMAND: "actmain"
    > >
    > > <<< more throttled groups removed >>>
    > >
    > >
    > >
    > > CPU 4
    > >
    > > CURRENT: PID: 0 CFS: ffff8800027d36e0 TASK: ffff8840668c0380
    > COMMAND:
    > > "swapper"
    > >
    > > TASK_GROUP RT_RQ: ffff8800027d3820
    > >
    > > RT PRIO_ARRAY: ffff8800027d3820
    > >
    > > [no tasks queued]
    > >
    > > TASK_GROUP CFS_RQ: ffff8800027d36e0
    > >
    > > CFS RB_ROOT: ffff8800027d3710
    > >
    > > GROUP: ffff88405394d000 CFS: ffff883ff69bcc00 RB_ROOT:
    > ffff883ff69bcc30
    > > <TDAT> (0 0)
    > >
    > > GROUP: ffff88405906c400 CFS: ffff884006290e00 RB_ROOT:
    > ffff884006290e30
    > > <User> (0 0)
    > >
    > > GROUP: ffff884055081000 CFS: ffff88400641c400 RB_ROOT:
    > ffff88400641c430
    > > <TDWMVP1> (0 0)
    > >
    > > GROUP: ffff884055081c00 CFS: ffff88400646b000 RB_ROOT:
    > ffff88400646b030
    > > <ServDown1:0> (0 0)
    > >
    > > GROUP: ffff8840580fd400 CFS: ffff884006492600 RB_ROOT:
    > ffff884006492630
    > > <ServOrder1:1> (0 0)
    > >
    > > GROUP: ffff884058f58c00 CFS: ffff883ff058fe00 RB_ROOT:
    > ffff883ff058fe30
    > > <TDWMWD57> (7 9) (THROTTLED)
    > >
    > > GROUP: ffff8808fb976000 CFS: ffff88047889ee00 RB_ROOT:
    > ffff88047889ee30
    > > <S:WD:3d:35fa> (1 1)
    > >
    > > [120] PID: 27655 TASK: ffff8805078ce2c0 COMMAND: "actmain"
    > >
    > > <<< more throttled groups removed >>>
    > >
    > >
    > >
    > > I have attached the patch we use to display additional information.
    > Could you
    > > please take a look at my proposal to see if it is possible that you include
    > > this kind of display format.
    > >
    > >
    > >
    > > Thanks,
    > >
    > > Anthony
    > 
    > A couple quick observations...
    > 
    > Adding the CFS: address makes sense, but besides you and me and
    > whoever
    > reads the patch/code, how would anybody know what the two numbers
    > inside
    > the parentheses mean?  It could be documented in the help page, but I
    > wonder if it could be made more obvious somehow?
    > 
    > Anyway, I ran a test on a sample set of vmcores, and this addition
    > will only work with relevant kernel versions.  For example, in these
    > four dumpfiles (and therefore their kernel versions), the command
    > fails because the cfs_rq.h_nr_running member does not exist:
    > 
    > 2.6.38.2-9.fc15:
    > 
    > CPU 0
    >   CURRENT: PID: 1180  CFS: ffff880037ef1b00 TASK: ffff88003bea2e40
    > COMMAND: "crash"
    >   TASK_GROUP RT_RQ: ffff88003fc13988
    >   RT PRIO_ARRAY: ffff88003fc13988
    >      [no tasks queued]
    >   TASK_GROUP CFS_RQ: ffff88003fc138b0
    >   CFS RB_ROOT: ffff88003fc138d8
    >      GROUP: ffff88003c610a00 CFS: ffff880037ef1b00 RB_ROOT:
    > ffff880037ef1b28
    > runq: invalid structure member offset: cfs_rq_h_nr_running
    >       FILE: task.c  LINE: 7632  FUNCTION: print_group_header_fair()
    > 
    > 
    > 2.6.40.4-5.fc15:
    > 
    > CPU 1
    >   CURRENT: PID: 1341  CFS: ffff880037592f00 TASK: ffff880037409730
    > COMMAND: "crash"
    >   TASK_GROUP RT_RQ: ffff88003fc92690
    >   RT PRIO_ARRAY: ffff88003fc92690
    >      [no tasks queued]
    >   TASK_GROUP CFS_RQ: ffff88003fc925b0
    >   CFS RB_ROOT: ffff88003fc925d8
    >      GROUP: ffff88003a84b800 CFS: ffff880037592f00 RB_ROOT:
    > ffff880037592f28
    > runq: invalid structure member offset: cfs_rq_h_nr_running
    >       FILE: task.c  LINE: 7632  FUNCTION: print_group_header_fair()
    > 
    > 
    > 2.6.32-131.0.15.el6:
    > 
    > CPU 0
    >   CURRENT: PID: 28263 CFS: ffff8800794b7140 TASK: ffff880037aaa040
    > COMMAND: "loop.ABA"
    >   TASK_GROUP RT_RQ: ffff88000a216098
    >   RT PRIO_ARRAY: ffff88000a216098
    >      [no tasks queued]
    >   TASK_GROUP CFS_RQ: ffff88000a215fe8
    >   CFS RB_ROOT: ffff88000a216010
    >      GROUP: ffff8800785bc200 CFS: ffff8800784d7c80 RB_ROOT:
    > ffff8800784d7ca8 <A>
    > runq: invalid structure member offset: cfs_rq_h_nr_running
    >       FILE: task.c  LINE: 7632  FUNCTION: print_group_header_fair()
    > 
    > 
    > 3.1.7-1.fc16:
    > 
    > CPU 2
    >   CURRENT: PID: 1495  CFS: ffff8800277f8500 TASK: ffff880037a60000
    > COMMAND: "crash"
    >   TASK_GROUP RT_RQ: ffff88003e2532d0
    >   RT PRIO_ARRAY: ffff88003e2532d0
    >      [no tasks queued]
    >   TASK_GROUP CFS_RQ: ffff88003e2531f0
    >   CFS RB_ROOT: ffff88003e253218
    >      GROUP: ffff88002722b000 CFS: ffff8800277f8500 RB_ROOT:
    > ffff8800277f8528
    > runq: invalid structure member offset: cfs_rq_h_nr_running
    >       FILE: task.c  LINE: 7632  FUNCTION: print_group_header_fair()
    > 
    > So you'll need to check VALID_MEMBER(cfs_rq_h_nr_running) before
    > attempting
    > to print it, and maybe just show the cfs_rq.nr_running value.
    > 
    > When adding new items to the offset_table, they should be added to
    > the
    > end of the structure so that extension modules will continue to be able
    > to access any offsets as expected without having to be re-compiled.
    > But you can display the new item in dump_offset_table() wherever
    > you'd
    > like, typically grouped with similar/associated items -- although in your
    > patch you did this:
    > 
    > +       fprintf(fp, "              cfs_rq_nr_running: %ld\n",
    > +               OFFSET(cfs_rq_nr_running));
    > +       fprintf(fp, "              cfs_rq_h_nr_running: %ld\n",
    > +               OFFSET(cfs_rq_h_nr_running));
    > 
    > The previously-existing cfs_rq_nr_running display already exists,
    > so I suggest that you just move your new cfs_rq_h_nr_running display
    > underneath it.  (and please line up the "help -o" output of the new
    > item as well).
    > 
    > Dave
    > 
    > 
    > 
    > --
    > Crash-utility mailing list
    > Crash-utility@xxxxxxxxxx
    > https://www.redhat.com/mailman/listinfo/crash-utility
Attachment:
crash_runq-g_v2.patch

Description: crash_runq-g_v2.patch
--
Crash-utility mailing list
Crash-utility@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/crash-utility