Re: [PATCH v2 00/25] support hiding data of offline cpu

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




----- Original Message -----
> On 09/30/2014 04:00 AM, Dave Anderson wrote:
> >
> >
> > ----- Original Message -----
> >> For those who don't care about the removed cpu, data of offline cpu is
> >> bothering. This patchset is trying to hide data of offline cpu. Please
> >> check each patch to see what is hiden.
> >>
> >> The last version is here:
> >> https://www.redhat.com/archives/crash-utility/2014-September/msg00000.html
> >>
> >> Changelog:
> >> v1->v2:
> >> 1. patch 1: add environment variable and its argument to hide/show
> >>     data of offline cpu.
> >> 2. add description of environment variable to crash.8
> >> 3. remove the restrict of x86_64
> >> 4. add "<OFFLINE>" to indicate those hiden data.
> >> 5. patch 22/23: addd "<OFFLINE>" to indicate idle task on offline
> >>     cpu
> >>
> >> Qiao Nuohan (25):
> >>    add a flag to display/hide data of offline cpus
> >>    add an API to check whether to hide a cpus' data
> >>    x86_64: modify help -m/-M to hide offline cpus' data
> >>    x86_64: modify bt -E to hide offline cpus' data
> >>    x86_64: modify mach to hide offline cpus' data
> >>    x86_64: modify mach -c to hide offline cpus' data
> >>    modify help -r to hide offline cpus' registers
> >>    modify bt -c to hide offline cpus' data
> >>    modify display_sys_stats() to hide cpus' number
> >>    modify help -k to hide offline cpus' number
> >>    modify set -c to hide offline cpu
> >>    modify irq -s to hide offline cpus' data
> >>    modify irq -a to hide offline cpus' data
> >>    modify timer -r to hide offline cpus' data
> >>    modify timer to hide offline cpus' data
> >>    modify ptov offset:cpuspec to hide offline cpus' data
> >>    fix max_cpudata_limit() when offlined cpu exists
> >>    modify kmem -o to hide offline cpus' data
> >>    modify kmem -S(SLUB) to hide offline cpus' data
> >>    modify struct/union/* [:cpuspec] to hide offline cpus' data
> >>    modify command p to hide offline cpus' data
> >>    modify ps to indicate idle task on offline cpu
> >>    modify print_task_header() to indicate idle task on offline cpu
> >>    modify ps -l/-m -C cpu to hide offline cpus' data
> >>    modify runq to hide offline cpus' data
> >>
> >>   crash.8    |  4 +++
> >>   defs.h     |  3 ++
> >>   diskdump.c |  6 +++-
> >>   help.c     |  6 ++++
> >>   kernel.c   | 83 +++++++++++++++++++++++++++++++++++++++++++++++++----
> >>   main.c     | 15 +++++++++-
> >>   memory.c   | 30 +++++++++++++++----
> >>   netdump.c  | 25 ++++++++++++++--
> >>   symbols.c  | 15 +++++++++-
> >>   task.c     | 97
> >>   ++++++++++++++++++++++++++++++++++++++++++++++++++++----------
> >>   tools.c    | 15 ++++++++++
> >>   x86_64.c   | 74 +++++++++++++++++++++++++++++++++++++++--------
> >>   12 files changed, 331 insertions(+), 42 deletions(-)
> >>
> >> --
> >> 1.8.5.3
> >
> >
> > Hello Qiao,
> >
> > In the future, can you please post the patches as attachments?
> > Unfortunately my Zimbra email web interface replace tabs with spaces,
> > so I have to manually cut-and-paste each patch from the crash-utility
> > mailing list archives.
> 
> I see.
> 
> >
> > I have a problem with a number of the patches, some of which
> > I have NAK'd completely, and others I would suggest changing
> > the output slightly.
> >
> > As I recall, one of your arguments for doing this is to avoid
> > verbose display of offline cpu data.  But in a few of your
> > patches, it simply replaces a one-line per-cpu piece of data
> > with the<OFFLINE>  string.  To me it makes more sense to
> > continue to display the data with an "[OFFLINE]" tag appended
> > to the end of the output line.
> >
> > Also, I had asked that there be *no* behavioral changes
> > unless the "offline" environment variable was set to "hide".
> > But there are a few patches that change things regardless of
> > the setting.  So for the most part, if any of the individual
> > patches do *not* use your hide_offline_cpu() function, then
> > it's highly likely that it has been NAK'd.
> >
> > Lastly, can you make your output show "[OFFLINE]" instead
> > of "<OFFLINE>"?   I'd prefer to keep the usage of "[ brackets ]"
> > as "informational" in nature, whereas "<  braces>" are typically
> > used to surround symbol names.
> 
> I would change it.
> 
> >
> > Here are the patches that I have NAK'd or would like changed:
> >
> > [PATCH v2 03/25] x86_64: modify help -m/-M to hide offline cpus' data
> >
> >    NAK.  This patch makes no sense to me.  It's a debug command that
> >    shows the contents of the crash-internal machdep and machine_specific
> >    data structures.  What possible benefit would there be to hide the
> >    legitimate addresses/contents?  Please leave them as they are.
> >
> > [PATCH v2 05/25] x86_64: modify mach to hide offline cpus' data
> >
> >    Please continue to display the stack addresses, but *followed by*
> >    "[OFFLINE]".
> >
> > [PATCH v2 09/25] modify display_sys_stats() to hide cpus' number
> >
> >    Here you are changing crash behavior regardless of the "offline"
> >    setting.  While it is true that you are now mimicking the PPC64
> >    behaviour, that is because I typically allow IBM to do their own
> >    thing with respect to the PPC64 (and S390/S390X) architectures.
> >    What I would suggest you do is this: for all architectures except
> >    PPC64, continue to show kt->cpus, but append the number of offlined
> >    cpus after it, i.e., something like this:
> >
> >        KERNEL: ../kdump/vmlinux
> >      DUMPFILE: ../kdump/vmcore  [PARTIAL DUMP]
> >          CPUS: 4 [1 OFFLINE]
> >          DATE: Tue Sep 23 14:54:26 2014
> >        UPTIME: 00:02:45
> 
> I see. ppc64 bothers me.
> 
> >
> > [PATCH v2 10/25] modify help -k to hide offline cpus' number
> >
> >    NAK.  "help -k" is a debug option whose purpose is to dump the
> >    contents of the crash-internal kernel_table; accordingly, the
> >    "cpus" output should show the contents of the kt->cpus member
> >    regardless whether there are offline cpus.
> >
> > [PATCH v2 13/25] modify irq -a to hide offline cpus' data
> >
> >    NAK.  This command essentially shows the contents of a
> >    cpu mask in a kernel data structure -- regardless whether
> >    the cpu is offline.
> >
> > [PATCH v2 16/25] modify ptov offset:cpuspec to hide offline cpus' data
> >
> >    There is no reason to *not* show the per-cpu VIRTUAL address, and
> >    you're not saving any space.  Instead please show the virtual address
> >    *followed by* "[OFFLINE]".
> >
> > [PATCH v2 18/25] modify kmem -o to hide offline cpus' data
> >
> >    There is no reason to *not* show the per-cpu OFFSET values, and
> >    you're not saving any space.  Instead please show the offset value
> >    *followed by* "[OFFLINE]".
> >
> > [PATCH v2 22/25] modify ps to indicate idle task on offline cpu
> >
> >    NAK.  This patch changes behavior regardless of the "offline" setting.
> >    The swapper task *does* exist, and it actually is the active task on
> >    that (offline) cpu.  Plus you're doing it only for the active task
> >    on the offline cpu -- but there would still be several other kernel
> >    tasks shown for that cpu even when it is offline.
> 
> The swapper task of offline cpu showed by ps command is strange to me.
> It shows the task is still running, with ">" at the start of the line
> and "ST" is "RU", but the cpu is halt. So I think it is needed to change
> the display of these tasks.
> 
> I will change the display when "offline" is "hide". And if you don't
> like add "OFFLINE" at the end, What about change "RU" to other(like "OF")
> and delete ">", just like:
> 
> <cut>
>     PID    PPID  CPU       TASK        ST  %MEM     VSZ    RSS  COMM
> ...
>        0      0   2  ffff88003dad5b00  OF   0.0       0      0  [swapper/2] <OFFLINE>
> ...
> <cut>

Regardless whether the cpu is online or not, the cpu's swapper task exists
and is always runnable, so the output is showing its legitimate state.  And
since no other task is queued on the cpu's run queue, then it's essentially
the "active" task on that cpu, waiting for it to come back online.  I'm not
sure why this is so bothersome to you?

I don't like the idea of "OF" in the state column, because that field shows
the contents of the task's "task_struct.state" member.  The command should
show what's in the kernel data structure, not some "made-up" state.

Interesting that you don't seem to be bothered by the other kernel threads
assigned to an offline'd cpu.  For example if I take cpu 2 offline on my machine,
it shows this, where PIDs 17, 18 and 19 have "??" status indicators:

crash> ps
   PID    PPID  CPU       TASK        ST  %MEM     VSZ    RSS  COMM
      0      0   0  ffffffff81c13440  RU   0.0       0      0  [swapper/0]
>     0      0   1  ffff88021282d330  RU   0.0       0      0  [swapper/1]
>     0      0   2  ffff88021282dac0  RU   0.0       0      0  [swapper/2]
>     0      0   3  ffff88021282e250  RU   0.0       0      0  [swapper/3]
      1      0   3  ffff880212828000  IN   0.0   50252   2836  systemd
      2      0   0  ffff880212828790  IN   0.0       0      0  [kthreadd]
      3      2   0  ffff880212828f20  IN   0.0       0      0  [ksoftirqd/0]
      5      2   0  ffff880212829e40  IN   0.0       0      0  [kworker/0:0H]
      7      2   0  ffff88021282ad60  IN   0.0       0      0  [kworker/u:0H]
      8      2   0  ffff88021282b4f0  IN   0.0       0      0  [migration/0]
      9      2   0  ffff88021282bc80  IN   0.0       0      0  [rcu_bh]
     10      2   3  ffff88021282c410  IN   0.0       0      0  [rcu_sched]
     11      2   0  ffff88021282cba0  IN   0.0       0      0  [watchdog/0]
     12      2   1  ffff8802128e0f20  IN   0.0       0      0  [watchdog/1]
     13      2   1  ffff8802128e16b0  IN   0.0       0      0  [migration/1]
     14      2   1  ffff8802128e1e40  IN   0.0       0      0  [ksoftirqd/1]
     16      2   1  ffff8802128e2d60  IN   0.0       0      0  [kworker/1:0H]
     17      2   2  ffff8802128e34f0  ??   0.0       0      0  [watchdog/2]
     18      2   2  ffff8802128e3c80  ??   0.0       0      0  [migration/2]
     19      2   2  ffff8802128e4410  ??   0.0       0      0  [ksoftirqd/2]
     21      2   2  ffff8802128e5330  IN   0.0       0      0  [kworker/2:0H]
     22      2   3  ffff8802128e5ac0  IN   0.0       0      0  [watchdog/3]
     23      2   3  ffff8802128e6250  IN   0.0       0      0  [migration/3]
     24      2   3  ffff8802128e69e0  IN   0.0       0      0  [ksoftirqd/3]

Those 3 threads have states of TASK_PARKED (512), which is currently not
recognized by the crash utility.  So I queued this fix for that issue today:

  https://github.com/crash-utility/crash/commit/4c0a1b34d47b4d34047a00b021e66e8585189d5b 

  Update the "ps" command's "ST" task state display to recognize the
  TASK_PARKED state in Linux 3.9 and later kernels.  Without the patch,
  the command's "ST" column entry for parked tasks shows "??".  The
  state column will now show "PA", and the foreach command will accept
  "PA" as a "state" argument.
  (anderson@xxxxxxxxxx)

> 
> >
> > [PATCH v2 23/25] modify print_task_header() to indicate idle task on
> > offline cpu
> >
> >    NAK.  This patches changes behavior regardless of the "offline" setting.
> >    The print_task_header() function is called by 53 different functions,
> >    and
> >    I'm not going to even bother checking each one for relevancy.
> 
> Same as 22/25, change "ST" to "OF"?

Why bother?  It's in the time-sorted list based upon the last time it was
scheduled as an "RU" process.  I would just leave it alone.

Dave

--
Crash-utility mailing list
Crash-utility@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/crash-utility




[Index of Archives]     [Fedora Development]     [Fedora Desktop]     [Fedora SELinux]     [Yosemite News]     [KDE Users]     [Fedora Tools]

 

Powered by Linux