On Thu, May 26, 2011 at 5:05 PM, KAMEZAWA Hiroyuki <kamezawa.hiroyu@xxxxxxxxxxxxxx> wrote:
Hmm, interesting....but isn't it very very very complicated interface ?On Thu, 26 May 2011 14:07:49 -0700
Ying Han <yinghan@xxxxxxxxxx> wrote:
> This adds histogram to capture pagefault latencies on per-memcg basis. I used
> this patch on the memcg background reclaim test, and figured there could be more
> usecases to monitor/debug application performance.
>
> The histogram is composed 8 bucket in ns unit. The last one is infinite (inf)
> which is everything beyond the last one. To be more flexible, the buckets can
> be reset and also each bucket is configurable at runtime.
>
> memory.pgfault_histogram: exports the histogram on per-memcg basis and also can
> be reset by echoing "reset". Meantime, all the buckets are writable by echoing
> the range into the API. see the example below.
>
> /proc/sys/vm/pgfault_histogram: the global sysfs tunablecan be used to turn
> on/off recording the histogram.
>
> Functional Test:
> Create a memcg with 10g hard_limit, running dd & allocate 8g anon page.
> Measure the anon page allocation latency.
>
> $ mkdir /dev/cgroup/memory/B
> $ echo 10g >/dev/cgroup/memory/B/memory.limit_in_bytes
> $ echo $$ >/dev/cgroup/memory/B/tasks
> $ dd if=/dev/zero of=/export/hdc3/dd/tf0 bs=1024 count=20971520 &
> $ allocate 8g anon pages
>
> $ echo 1 >/proc/sys/vm/pgfault_histogram
>
> $ cat /dev/cgroup/memory/B/memory.pgfault_histogram
> pgfault latency histogram (ns):
> < 600 2051273
> < 1200 40859
> < 2400 4004
> < 4800 1605
> < 9600 170
> < 19200 82
> < 38400 6
> < inf 0
>
> $ echo reset >/dev/cgroup/memory/B/memory.pgfault_histogram
> $ cat /dev/cgroup/memory/B/memory.pgfault_histogram
> pgfault latency histogram (ns):
> < 600 0
> < 1200 0
> < 2400 0
> < 4800 0
> < 9600 0
> < 19200 0
> < 38400 0
> < inf 0
>
> $ echo 500 520 540 580 600 1000 5000 >/dev/cgroup/memory/B/memory.pgfault_histogram
> $ cat /dev/cgroup/memory/B/memory.pgfault_histogram
> pgfault latency histogram (ns):
> < 500 50
> < 520 151
> < 540 3715
> < 580 1859812
> < 600 202241
> < 1000 25394
> < 5000 5875
> < inf 186
>
> Performance Test:
> I ran through the PageFaultTest (pft) benchmark to measure the overhead of
> recording the histogram. There is no overhead observed on both "flt/cpu/s"
> and "fault/wsec".
>
> $ mkdir /dev/cgroup/memory/A
> $ echo 16g >/dev/cgroup/memory/A/memory.limit_in_bytes
> $ echo $$ >/dev/cgroup/memory/A/tasks
> $ ./pft -m 15g -t 8 -T a
>
> Result:
> "fault/wsec"
>
> $ ./ministat no_histogram histogram
> x no_histogram
> + histogram
> +--------------------------------------------------------------------------+
> N Min Max Median Avg Stddev
> x 5 813404.51 824574.98 821661.3 820470.83 4202.0758
> + 5 821228.91 825894.66 822874.65 823374.15 1787.9355
>
> "flt/cpu/s"
>
> $ ./ministat no_histogram histogram
> x no_histogram
> + histogram
> +--------------------------------------------------------------------------+
> N Min Max Median Avg Stddev
> x 5 104951.93 106173.13 105142.73 105349.2 513.78158
> + 5 104697.67 105416.1 104943.52 104973.77 269.24781
> No difference proven at 95.0% confidence
>
> Signed-off-by: Ying Han <yinghan@xxxxxxxxxx>
Could you make this for 'perf' ? Then, everyone (including someone who don't use memcg)
will be happy.
Thank you for looking at it.
There is only one per-memcg API added which is basically exporting the histogram. The "reset" and reconfiguring the bucket is not "must" but make it more flexible. Also, the sysfs API can be reduced if necessary since there is no over-head observed by always turning it on anyway.
I am not familiar w/ perf, any suggestions how it is supposed to be look like?
Thanks
--Ying
Thanks,
-Kame