Re: Intel power tuning - 30% throughput performance increase

Luis Periquito <periquito@xxxxxxxxx> · Wed, 3 May 2017 08:57:28 +0100

One of the things I've noticed in the latest (3+ years) batch of CPUs
is that they ignore more the cpu scaler drivers and do what they want.
More than that interfaces like the /proc/cpuinfo are completely
incorrect.

I keep checking the real frequencies using applications like the
"i7z", and it shows per core real frequency.

On the flip side as more of it is directly controlled by the CPUs it
also means it should be safer to run over a longer period of time.

On my testing, it was made with Trusty with default 3.13 kernel,
changing the driver to performance and disabling all powersave options
on BIOS meant circa 50% better latency (for both SSD and HDD cluster),
and with around 10% power usage increase.

On Wed, May 3, 2017 at 8:43 AM, Dan van der Ster <dan@xxxxxxxxxxxxxx> wrote:
> Hi Blair,
>
> We use cpu_dma_latency=1, because it was in the latency-performance profile.
> And indeed by setting cpu_dma_latency=0 on one of our OSD servers,
> powertop now shows the package as 100% in turbo mode.
>
> So I suppose we'll pay for this performance boost in energy.
> But more importantly, can the CPU survive being in turbo 100% of the time?
>
> -- Dan
>
>
>
> On Wed, May 3, 2017 at 9:13 AM, Blair Bethwaite
> <blair.bethwaite@xxxxxxxxx> wrote:
>> Hi all,
>>
>> We recently noticed that despite having BIOS power profiles set to
>> performance on our RHEL7 Dell R720 Ceph OSD nodes, that CPU frequencies
>> never seemed to be getting into the top of the range, and in fact spent a
>> lot of time in low C-states despite that BIOS option supposedly disabling
>> C-states.
>>
>> After some investigation this C-state issue seems to be relatively common,
>> apparently the BIOS setting is more of a config option that the OS can
>> choose to ignore. You can check this by examining
>> /sys/module/intel_idle/parameters/max_cstate - if this is >1 and you *think*
>> C-states are disabled then your system is messing with you.
>>
>> Because the contemporary Intel power management driver
>> (https://www.kernel.org/doc/Documentation/cpu-freq/intel-pstate.txt) now
>> limits the proliferation of OS level CPU power profiles/governors, the only
>> way to force top frequencies is to either set kernel boot command line
>> options or use the /dev/cpu_dma_latency, aka pmqos, interface.
>>
>> We did the latter using the pmqos_static.py, which was previously part of
>> the RHEL6 tuned latency-performance profile, but seems to have been dropped
>> in RHEL7 (don't yet know why), and in any case the default tuned profile is
>> throughput-performance (which does not change cpu_dma_latency). You can find
>> the pmqos-static.py script here
>> https://github.com/NetSys/NetBricks/blob/master/scripts/tuning/pmqos-static.py.
>>
>> After setting `./pmqos-static.py cpu_dma_latency=0` across our OSD nodes we
>> saw a conservative 30% increase in backfill and recovery throughput - now
>> when our main RBD pool of 900+ OSDs is backfilling we expect to see ~22GB/s,
>> previously that was ~15GB/s.
>>
>> We have just got around to opening a case with Red Hat regarding this as at
>> minimum Ceph should probably be actively using the pmqos interface and tuned
>> should be setting this with recommendations for the latency-performance
>> profile in the RHCS install guide. We have done no characterisation of it on
>> Ubuntu yet, however anecdotally it looks like it has similar issues on the
>> same hardware.
>>
>> Merry xmas.
>>
>> Cheers,
>> Blair
>>
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com