Re: [PATCH v2] ps: Add support to "ps -l|-m" to properly display process list

Austin Kim <austindh.kim@xxxxxxxxx> · Thu, 3 Mar 2022 22:19:06 +0900

Hello, Kazu

2022년 3월 3일 (목) 오후 4:50, HAGIO KAZUHITO(萩尾　一仁) <k-hagio-ab@xxxxxxx>님이 작성:
>
> Hi Austin,
>
> -----Original Message-----
> > > > This is because output of "ps -l|-m" depends on task_struct.sched_info.last_arrival.
> > > >
> > > > Without CONFIG_SCHEDSTATS or CONFIG_SCHED_INFO, 'sched_info' field is not included
> > > > in task_struct.
> > > >
> > > > So we make "ps -l|-m" option to access 'exec_start' field of sched_entity
> > > > where 'exec_start' is task_struct.se.exec_start.
> > >
> > > Could you describe what the exec_start means?  When is it updated?
> > >
> >
> > The 'task_struct.se.exec_start' contains the most recently-executed
> > timestamp when
> > process is running in the below cases;
> >
> >  - enqueued to runqueue
> >  - dequeued from ruqueue
> >  - scheduler tick is invoked
> >  - etc
> >
> > So I guess 'task_struct.se.exec_start' could be one of statistics
> > which indicates
> > the most recently run timestamp of process activity.
> >
> > From CFS scheduler's point of view, 'task_struct.se.exec_start' is
> > updated within update_curr()
> > where its call path is various as below.
> >
> >  - enqueue_task_fair, -dequeue_task_fair, task_tick_fair,
> > check_preempt_wakeup, ...
>
> Thank you for looking into this.
>
> As for when the se.exec_start is updated, I think you are right.
> (with my understanding, probably it's the last time or tick when a task
> is in a runqueue regardless of getting a CPU.)
>
> But I found a problem, is that the se.exec_start is from rq->clock_task,
> not from rq->clock like last_arrival.  The rq->clock_task may not contain
> irq/steal time, please see update_rq_clock_task().
>

I am understood after looking at below code.

static void sched_info_arrive(struct rq *rq, struct task_struct *t)
{
...
   now = rq_clock(rq);  //<<--
...
   t->sched_info.last_arrival = now;
}
static void update_curr(struct cfs_rq *cfs_rq)
{
   u64 now = rq_clock_task(rq_of(cfs_rq)); //<<--
...
   curr->exec_start = now;
...
}

> This causes the following issues with a vmcore generated on a machine,
> which had run for a while (273 days):
>

Let me ask a question at this point.
The vmcore was generated from machine which has been running 273 days.

I wonder the kernel image of machine(running 273 days) is without
CONFIG_SCHEDSTATS/CONFIG_SCHED_INFO.
If output is based on vmcore without CONFIG_SCHEDSTATS or CONFIG_SCHED_INFO,
you can ignore above question.

I would like to remind that as commit description said, the patch is
**only** valid
without CONFIG_SCHEDSTATS or CONFIG_SCHED_INFO.

When kernel image is with CONFIG_SCHEDSTATS or CONFIG_SCHED_INFO and then
"ps -l|-m" command is used, task_last_run() is called from its subroutines.

ulonglong
task_last_run(ulong task)
{
        ulong last_run;
...
    else if (VALID_MEMBER(sched_info_last_arrival)) // With
CONFIG_SCHEDSTATS or CONFIG_SCHED_INFO
            timestamp = tt->last_task_read ?  ULONGLONG(tt->task_struct +
            OFFSET(task_struct_sched_info) +
            OFFSET(sched_info_last_arrival)) : 0;
+    else if (VALID_MEMBER(sched_entity_exec_start)) // Without
CONFIG_SCHEDSTATS or CONFIG_SCHED_INFO
+            timestamp = tt->last_task_read ?  ULONGLONG(tt->task_struct +
+            OFFSET(task_struct_se) +
+            OFFSET(sched_entity_exec_start)) : 0;

        return timestamp;
}

With CONFIG_SCHEDSTATS or CONFIG_SCHED_INFO enabled,
the statement under 'VALID_MEMBER(sched_entity_exec_start)' is not called.

> crash> ps -m | head
> [  0 00:59:36.582] [RU]  PID: 4023608  TASK: ffff916f7c6b1840  CPU: 15  COMMAND: "makedumpfile"
>         ^^^^^^^^^(1)
> [  0 00:59:37.831] [ID]  PID: 413      TASK: ffff916f772d3080  CPU: 15  COMMAND: "kworker/15:1H"
> [  0 00:59:39.765] [IN]  PID: 3929504  TASK: ffff916f5f0748c0  CPU: 15  COMMAND: "respawn_actlog"
> [  0 00:59:40.650] [IN]  PID: 1974     TASK: ffff91647dc53080  CPU: 15  COMMAND: "CPU 2/KVM"
> [  0 00:59:41.925] [IN]  PID: 1297     TASK: ffff916f63c46100  CPU: 15  COMMAND: "NetworkManager"
> [  0 00:59:42.944] [ID]  PID: 3763057  TASK: ffff9160c4519840  CPU: 15  COMMAND: "kworker/15:0"
> [  0 00:59:42.944] [IN]  PID: 101      TASK: ffff916040c91840  CPU: 15  COMMAND: "migration/15"
> [  0 00:59:43.078] [IN]  PID: 100      TASK: ffff916040c5b080  CPU: 15  COMMAND: "watchdog/15"
> [  0 00:59:47.533] [IN]  PID: 1292     TASK: ffff916f63c43080  CPU: 15  COMMAND: "lsmd"
> [  0 00:59:49.089] [IN]  PID: 113105   TASK: ffff9160412248c0  CPU: 15  COMMAND: "kvm-nx-lpage-re"
>                                                                ^^^^^^^(2)
> (1) large difference from zero
> (2) large differences among CPUs (probably due to the differences of irq time)
>
> (1) might be solved with rq->clock_task, but (2) will be misleading and confusing.
> So currently I'm thinking that the "ps -l|-m" options should not use the se.exec_start.
> What do you think?
>

Agreed. (As you indicated) Since rq->clock cannot hold accurate
timestamp we can rely on,
"ps -l|-m" options should not access the se.exec_start in case of kernel image
without CONFIG_SCHEDSTATS or CONFIG_SCHED_INFO.

Thanks for your code review.
If there is something to improve, let me propose a patch again.

BR,
Austin Kim

> Thanks,
> Kazu
>

--
Crash-utility mailing list
Crash-utility@xxxxxxxxxx
https://listman.redhat.com/mailman/listinfo/crash-utility
Contribution Guidelines: https://github.com/crash-utility/crash/wiki