Re: [PATCH] Add -r option to timer

Dave Anderson <anderson@xxxxxxxxxx> · Tue, 26 Feb 2013 11:52:11 -0500 (EST)

----- Original Message -----
> Hello Dave,
> 
> The attached patches are used to display hrtimers.
> 
> To see detailed information, can refer to the patches.
> 
> --
> Regards
> Qiao Nuohan

Hi Qiao,

I ran this option through my set of sample vmcores, and have a couple
comments/suggestions.

A few of the sample vmcores generated bad rb_node.last virtual addresses,
and when they do, the command is aborted.  For example:

  $ crash -s 2.6.32-313.el6_softlockup/vmcore 2.6.32-313.el6_softlockup/vmlinux.gz
  crash> timer -r
  UPTIME: 300039(1000HZ)

  cpu: 0
   clock: 0
    .base:        ffff8800282115a8
    .offset:      1354043557752205725
    .get_time:    ktime_get_real
   EXPIRES      HRTIMER           FUNCTION
  (empty)

   clock: 1
    .base:        ffff8800282115e8
    .offset:      0
    .get_time:    ktime_get
             EXPIRES                 HRTIMER           FUNCTION
    300040000000-300040000000    ffff8800282116a0  ffffffff810a4b70  <tick_sched_timer>
   3660368239466-3660368239466   ffff880224f07c68  ffffffff81071c00  <it_real_fn>
   3660700491472-3660700491472   ffff880224f07068  ffffffff81071c00  <it_real_fn>
  14461470907794-14461570907794  ffff88022750fa68  ffffffff81098160  <hrtimer_wakeup>

   clock: 2
    .base:        ffff880028211628
    .offset:      0
    .get_time:
  timer: invalid kernel virtual address: 1bc2f  type: "rb_node last"
  crash>

Here are some other vmcore examples where it failed similarly:

  timer: invalid kernel virtual address: 1  type: "rb_node last"
  timer: invalid kernel virtual address: 1ffffffff  type: "rb_node last"
  timer: invalid kernel virtual address: 1  type: "rb_node last"
  timer: invalid kernel virtual address: 1  type: "rb_node last"

Perhaps there could be a way to pre-verify the addresses with 
accessible(), and if the address is bogus, display an error message,
but allow the command to continue on with the other cpus?

And secondly, I ran into numerous examples of runaway commands 
that loop over the same hrtimer entries.  The kernels were typically
taken with the "snap.so" extension module, or where the dumpfiles 
were taken with "virsh dump"  -- but not always.  

For examples:

  $ crash -s 2.6.18-152.el5_HVM_virsh_dump/vmcore 2.6.18-152.el5_HVM_virsh_dump/vmlinux.gz
  ... [ cut ] ...
  cpu: 1
   clock: 0
    .base:        ffff810009580aa0
    .get_time:    ktime_get_real
   EXPIRES      HRTIMER           FUNCTION
  (empty)

   clock: 1
    .base:        ffff810009580ae0
    .get_time:    ktime_get
     EXPIRES        HRTIMER           FUNCTION
  901601121855  ffff810034107ee8  ffffffff800a2536  <hrtimer_wakeup>
  930465261855  ffff81003c22b5b0  ffffffff80093ced  <it_real_fn>
  922835889855  ffff81003333bee8  ffffffff800a2536  <hrtimer_wakeup>
  930465261855  ffff81003c22b5b0  ffffffff80093ced  <it_real_fn>
  922835889855  ffff81003333bee8  ffffffff800a2536  <hrtimer_wakeup>
  930465261855  ffff81003c22b5b0  ffffffff80093ced  <it_real_fn>
  922835889855  ffff81003333bee8  ffffffff800a2536  <hrtimer_wakeup>
  930465261855  ffff81003c22b5b0  ffffffff80093ced  <it_real_fn>
  922835889855  ffff81003333bee8  ffffffff800a2536  <hrtimer_wakeup>
  930465261855  ffff81003c22b5b0  ffffffff80093ced  <it_real_fn>
  922835889855  ffff81003333bee8  ffffffff800a2536  <hrtimer_wakeup>
  ... [ forever ] ....

  $ crash -s snapshot-3.1.7-1.fc16/vmcore snapshot-3.1.7-1.fc16/vmlinux.gz
  crash> timer -r
  ... [ cut ] ...
  cpu: 6
   clock: 0
    .base:        ffff88003e2ce180
    .offset:      0
    .get_time:    ktime_get
            EXPIRES                HRTIMER           FUNCTION
  1689390000000-1689390000000  ffff88003e2ce280  ffffffff8109f650  <tick_sched_timer>
  1692446941251-1692446941251  ffff88003e2ce400  ffffffff810d9790  <watchdog_timer_fn>
  3628519841476-3628519891476  ffff880033fd5eb8  ffffffff81091b70  <hrtimer_wakeup>
  1689390000000-1689390000000  ffff88003e2ce280  ffffffff8109f650  <tick_sched_timer>
  1692446941251-1692446941251  ffff88003e2ce400  ffffffff810d9790  <watchdog_timer_fn>
  3628519841476-3628519891476  ffff880033fd5eb8  ffffffff81091b70  <hrtimer_wakeup>
  1689390000000-1689390000000  ffff88003e2ce280  ffffffff8109f650  <tick_sched_timer>
  1692446941251-1692446941251  ffff88003e2ce400  ffffffff810d9790  <watchdog_timer_fn>
  3628519841476-3628519891476  ffff880033fd5eb8  ffffffff81091b70  <hrtimer_wakeup>
  1689390000000-1689390000000  ffff88003e2ce280  ffffffff8109f650  <tick_sched_timer>
  1692446941251-1692446941251  ffff88003e2ce400  ffffffff810d9790  <watchdog_timer_fn>
  3628519841476-3628519891476  ffff880033fd5eb8  ffffffff81091b70  <hrtimer_wakeup>
  1689390000000-1689390000000  ffff88003e2ce280  ffffffff8109f650  <tick_sched_timer>
  ... [ forever ] ...

Maybe you can use hq_open()/hq_enter()/hq_close() on the hrtimer addresses
to prevent this from happening, warn the user when it does, and continue on
with the next cpu?

Thanks,
  Dave

--
Crash-utility mailing list
Crash-utility@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/crash-utility