Re: How to use cgroup to bind ceph-osd to a specific cpu core?

Ray Sun <xiaoquqi@xxxxxxxxx> · Wed, 1 Jul 2015 09:02:46 +0800

Jan,
Thanks a lot. I can do my contribution to this project if I can.

Best Regards
-- Ray

On Tue, Jun 30, 2015 at 11:50 PM, Jan Schermer <jan@xxxxxxxxxxx> wrote:
Hi all,our script is available on GitHub

https://github.com/prozeta/pincpus

I haven’t had much time to do a proper README, but I hope the configuration is self explanatory enough for now.
What it does is pin each OSD into the most “empty” cgroup assigned to a NUMA node.

Let me know how it works for you!

Jan

On 30 Jun 2015, at 10:50, Huang Zhiteng <winston.d@xxxxxxxxx> wrote:

On Tue, Jun 30, 2015 at 4:25 PM, Jan Schermer <jan@xxxxxxxxxxx> wrote:
Not having OSDs and KVMs compete against each other is one thing.But there are more reasons to do this

1) not moving the processes and threads between cores that much (better cache utilization)
2) aligning the processes with memory on NUMA systems (that means all modern dual socket systems) - you don’t want your OSD running on CPU1 with memory allocated to CPU2
3) the same goes for other resources like NICs or storage controllers - but that’s less important and not always practical to do
4) you can limit the scheduling domain on linux if you limit the cpuset for your OSDs (I’m not sure how important this is, just best practice)
5) you can easily limit memory or CPU usage, set priority, with much greater granularity than without cgroups
6) if you have HyperThreading enabled you get the most gain when the workloads on the threads are dissimiliar - so to have the higher throughput you have to pin OSD to thread1 and KVM to thread2 on the same core. We’re not doing that because latency and performance of the core can vary depending on what the other thread is doing. But it might be useful to someone.

Some workloads exhibit >100% performance gain when everything aligns in a NUMA system, compared to a SMP mode on the same hardware. You likely won’t notice it on light workloads, as the interconnects (QPI) are very fast and there’s a lot of bandwidth, but for stuff like big OLAP databases or other data-manipulation workloads there’s a huge difference. And with CEPH being CPU hungy and memory intensive, we’re seeing some big gains here just by co-locating the memory with the processes….
Could you elaborate a it on this?  I'm interested to learn in what situation memory locality helps Ceph to what extend. 

Jan

On 30 Jun 2015, at 08:12, Ray Sun <xiaoquqi@xxxxxxxxx> wrote:

Sound great, any update please let me know.

Best Regards
-- Ray

On Tue, Jun 30, 2015 at 1:46 AM, Jan Schermer <jan@xxxxxxxxxxx> wrote:
I promised you all our scripts for automatic cgroup assignment - they are in our production already and I just need to put them on github, stay tuned tomorrow :-)
Jan

On 29 Jun 2015, at 19:41, Somnath Roy <Somnath.Roy@xxxxxxxxxxx> wrote:

Presently, you have to do it by using tool like ‘taskset’ or ‘numactl’…

Thanks & Regards
Somnath

From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf Of Ray Sun
Sent: Monday, June 29, 2015 9:19 AM
To: ceph-users@xxxxxxxxxxxxxx
Subject:  How to use cgroup to bind ceph-osd to a specific cpu core?

Cephers,
I want to bind each of my ceph-osd to a specific cpu core, but I didn't find any document to explain that, could any one can provide me some detailed information. Thanks.

Currently, my ceph is running like this:

oot      28692      1  0 Jun23 ?        00:37:26 /usr/bin/ceph-mon -i seed.econe.com --pid-file /var/run/ceph/mon.seed.econe.com.pid -c /etc/ceph/ceph.conf --cluster ceph
root      40063      1  1 Jun23 ?        02:13:31 /usr/bin/ceph-osd -i 0 --pid-file /var/run/ceph/osd.0.pid -c /etc/ceph/ceph.conf --cluster ceph
root      42096      1  0 Jun23 ?        01:33:42 /usr/bin/ceph-osd -i 1 --pid-file /var/run/ceph/osd.1.pid -c /etc/ceph/ceph.conf --cluster ceph
root      43263      1  0 Jun23 ?        01:22:59 /usr/bin/ceph-osd -i 2 --pid-file /var/run/ceph/osd.2.pid -c /etc/ceph/ceph.conf --cluster ceph
root      44527      1  0 Jun23 ?        01:16:53 /usr/bin/ceph-osd -i 3 --pid-file /var/run/ceph/osd.3.pid -c /etc/ceph/ceph.conf --cluster ceph
root      45863      1  0 Jun23 ?        01:25:18 /usr/bin/ceph-osd -i 4 --pid-file /var/run/ceph/osd.4.pid -c /etc/ceph/ceph.conf --cluster ceph
root      47462      1  0 Jun23 ?        01:20:36 /usr/bin/ceph-osd -i 5 --pid-file /var/run/ceph/osd.5.pid -c /etc/ceph/ceph.conf --cluster ceph

Best Regards
-- Ray

PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________

ceph-users mailing list

ceph-users@xxxxxxxxxxxxxx

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

-- 
Regards
Huang Zhiteng

_______________________________________________

ceph-users mailing list

ceph-users@xxxxxxxxxxxxxx

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com