Re: RFC vmstat: On demand vmstat threads

Gilad Ben-Yossef <gilad@xxxxxxxxxxxxx> · Thu, 19 Sep 2013 11:54:58 -0500

On Wed, Sep 18, 2013 at 5:06 PM, Andrew Morton
<akpm@xxxxxxxxxxxxxxxxxxxx> wrote:
> On Tue, 10 Sep 2013 21:13:34 +0000 Christoph Lameter <cl@xxxxxxxxx> wrote:
>

>> With this patch it is possible then to have periods longer than
>> 2 seconds without any OS event on a "cpu" (hardware thread).
>
> It would be useful (actually essential) to have a description of why
> anyone cares about this.  A good and detailed description, please.

Let me have a stab at this:

The existing vmstat_update mechanism depends on a deferrable timer
firing every second
by default which registers a work queue item that runs on the local
CPU, with the result
that we have 1 interrupt and one additional schedulable task on each
CPU aprox. every second.

If your workload indeed causes VM activity or you are running multiple
tasks per CPU, you probably
have bigger issues to deal with.

However, many existing workloads dedicate a CPU for a single  CPU bound task.
This is done by high performance computing folks, by high frequency
financial applications  folks,
by networking folks  (Intel DPDK, EZchip NPS) and with the advent of
systems with more and more
CPUs over time, this  will(?) become more and more common to do since
when you have enough CPUs
you care less about efficiently sharing your CPU with other tasks and
more about
efficiently monopolizing a CPU per task.

The difference of having this timer firing and workqueue kernel thread
scheduled per second can be enormous.
An artificial test I made measuring the worst case time to do a simple
"i++" in an endless loop on a bare metal
system and under Linux on an isolated CPU (cpusets or isolcpus - take
your pick) with dynticks and with and
without this patch, have Linux match the bare metal performance (~700
cycles) with this patch and loose by
couple of orders of magnitude (~200k cycles) without it[*]  - and all
this for something that just calculates statistics.
For  networking applications, for example, this is the difference
between dropping packets or sustaining line rate.

Statistics are important and useful, but if there is a way to not
cause statistics gathering produce
such a huge performance difference would be great. This is what we are
trying to do here.

Does it makes sense?

[*] To be honest it required one more patch, but this one or something
like is needed to get that one working, so...

Thanks,
Gilad

-- 
Gilad Ben-Yossef
Chief Coffee Drinker
gilad@xxxxxxxxxxxxx
Israel Cell: +972-52-8260388
US Cell: +1-973-8260388
http://benyossef.com

"If you take a class in large-scale robotics, can you end up in a
situation where the homework eats your dog?"
 -- Jean-Baptiste Queru

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>