Re: [RFC][PATCH v2 5/7] taskstats: Improve cumulative CPU time accounting

Martin Schwidefsky <schwidefsky@xxxxxxxxxx> · Tue, 16 Nov 2010 12:25:38 +0100

On Tue, 16 Nov 2010 11:24:43 +0100
Rob van der Heij <rvdheij@xxxxxxxxx> wrote:

> On Tue, Nov 16, 2010 at 9:54 AM, Martin Schwidefsky
> <schwidefsky@xxxxxxxxxx> wrote:
> 
> > There are basically two things we want to accomplish:
> 
> > 1) Make top faster by replacing the /proc based data gathering with taskstats.
> > Â To really make it go fast with taskstats we filter with last_depart to
> > Â read only tasks that have changed since the last snapshot.
> > 2) Make top more precise. That is where all of the extra accounting comes
> > Â into play.
> 
> I've been lurking mostly (other than a private note to Michael about
> this a few weeks ago) but...
> 
> Re: 2)  I really think this is a fruitless effort. With a tool like
> top showing you one single interval of random length, the data
> presented is for recreational purposes at best. Given that, it
> provides the justification for 1) like "when the data is useless, it
> should be cheap to collect" :-)
> IIRC one of the initial anomalies with thread accounting was also
> caused by attempts to make the output of top correct, while missing
> the overall consistency and capture ratio.
> 
> I certainly value proper accounting. But I'm frequently looking at
> systems where 10-20% of the consumed CPU capacity is attributed to
> kernel daemons (journaling, swapping, interrupt handling, etc). When
> we can't attribute that to individual processes, it may not be
> interesting to worry about the precision you talk about.

There we get into the area of kernel daemons doing work on behalf of
another task. This is a can of worms and really hard to solve. You
would need instrumentation throughout the kernel code to attribute
the time correctly based on a requests. Now consider things like
block request merging or sharing of file pages. Not nice at all.

> If I understand the proposed solution, it remains a "sample based"
> accounting. That means your granularity is limited by the rate with
> which you can sample. When the cost of sampling is an order of
> magnitude less, you can afford to sample more often and achieve higher
> granularity at the same cost. High capture ratio is more important
> than high granularity.

The taskstat interface has a sample and an event driven part. You
can take a snapshot at a time of you choosing. With exit events you
will be notified whenever a process dies and you get the last valid
accounting information. With Michaels proposed new taskstat command
you can restrict the amount of data for the sample based approach
to the tasks that did something. For top we try to avoid the exit
events because there can be quite a lot of them and it costs quite
a bit of cpu to process them.

We use the precise top to see if we missed something. If the amount
of accounted cpu time does not add up to #cpus * 100% something is
wrong.

-- 
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.

--
To unsubscribe from this list: send the line "unsubscribe linux-s390" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html