Re: Intel power tuning - 30% throughput performance increase

Wido den Hollander <wido@xxxxxxxx> · Wed, 3 May 2017 09:24:18 +0200 (CEST)

> Op 3 mei 2017 om 9:13 schreef Blair Bethwaite <blair.bethwaite@xxxxxxxxx>:
> 
> 
> Hi all,
> 
> We recently noticed that despite having BIOS power profiles set to
> performance on our RHEL7 Dell R720 Ceph OSD nodes, that CPU frequencies
> never seemed to be getting into the top of the range, and in fact spent a
> lot of time in low C-states despite that BIOS option supposedly disabling
> C-states.
> 
> After some investigation this C-state issue seems to be relatively common,
> apparently the BIOS setting is more of a config option that the OS can
> choose to ignore. You can check this by examining
> /sys/module/intel_idle/parameters/max_cstate
> - if this is >1 and you *think* C-states are disabled then your system is
> messing with you.
> 
> Because the contemporary Intel power management driver (
> https://www.kernel.org/doc/Documentation/cpu-freq/intel-pstate.txt) now
> limits the proliferation of OS level CPU power profiles/governors, the only
> way to force top frequencies is to either set kernel boot command line
> options or use the /dev/cpu_dma_latency, aka pmqos, interface.
> 

You mean the kernel option 'intel_idle.max_cstate=1' here I think?

> We did the latter using the pmqos_static.py, which was previously part of
> the RHEL6 tuned latency-performance profile, but seems to have been dropped
> in RHEL7 (don't yet know why), and in any case the default tuned profile is
> throughput-performance (which does not change cpu_dma_latency). You can
> find the pmqos-static.py script here
> https://github.com/NetSys/NetBricks/blob/master/scripts/tuning/pmqos-static.py
> .
> 

Thanks for the script!

> After setting `./pmqos-static.py cpu_dma_latency=0` across our OSD nodes we
> saw a conservative 30% increase in backfill and recovery throughput - now
> when our main RBD pool of 900+ OSDs is backfilling we expect to see
> ~22GB/s, previously that was ~15GB/s.
> 

Is this a HDD or SSD cluster? I assume the latter? Since usually HDDs are 100% busy during heavy recovery.

Do you also know how much more power these machines started to use? Your iDRAC might be able to tell you this.

> We have just got around to opening a case with Red Hat regarding this as at
> minimum Ceph should probably be actively using the pmqos interface and
> tuned should be setting this with recommendations for the
> latency-performance profile in the RHCS install guide. We have done no
> characterisation of it on Ubuntu yet, however anecdotally it looks like it
> has similar issues on the same hardware.
> 

Would you maybe want to write a pull request to get this in to docs.ceph.com? 

Wido

> Merry xmas.
> 
> Cheers,
> Blair
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com