Re: How to use cgroup to bind ceph-osd to a specific cpu core?

Saverio Proto <zioproto@xxxxxxxxx> · Mon, 27 Jul 2015 13:23:17 +0200

Hello Jan,

I am testing your scripts, because we want also to test OSDs and VMs
on the same server.

I am new to cgroups, so this might be a very newbie question.
In your script you always reference to the file
/cgroup/cpuset/libvirt/cpuset.cpus

but I have the file in /sys/fs/cgroup/cpuset/libvirt/cpuset.cpus

I am working on Ubuntu 14.04

This difference comes from something special in your setup, or maybe
because we are working on different Linux distributions ?

Thanks for clarification.

Saverio

2015-06-30 17:50 GMT+02:00 Jan Schermer <jan@xxxxxxxxxxx>:
> Hi all,
> our script is available on GitHub
>
> https://github.com/prozeta/pincpus
>
> I haven’t had much time to do a proper README, but I hope the configuration
> is self explanatory enough for now.
> What it does is pin each OSD into the most “empty” cgroup assigned to a NUMA
> node.
>
> Let me know how it works for you!
>
> Jan
>
>
> On 30 Jun 2015, at 10:50, Huang Zhiteng <winston.d@xxxxxxxxx> wrote:
>
>
>
> On Tue, Jun 30, 2015 at 4:25 PM, Jan Schermer <jan@xxxxxxxxxxx> wrote:
>>
>> Not having OSDs and KVMs compete against each other is one thing.
>> But there are more reasons to do this
>>
>> 1) not moving the processes and threads between cores that much (better
>> cache utilization)
>> 2) aligning the processes with memory on NUMA systems (that means all
>> modern dual socket systems) - you don’t want your OSD running on CPU1 with
>> memory allocated to CPU2
>> 3) the same goes for other resources like NICs or storage controllers -
>> but that’s less important and not always practical to do
>> 4) you can limit the scheduling domain on linux if you limit the cpuset
>> for your OSDs (I’m not sure how important this is, just best practice)
>> 5) you can easily limit memory or CPU usage, set priority, with much
>> greater granularity than without cgroups
>> 6) if you have HyperThreading enabled you get the most gain when the
>> workloads on the threads are dissimiliar - so to have the higher throughput
>> you have to pin OSD to thread1 and KVM to thread2 on the same core. We’re
>> not doing that because latency and performance of the core can vary
>> depending on what the other thread is doing. But it might be useful to
>> someone.
>>
>> Some workloads exhibit >100% performance gain when everything aligns in a
>> NUMA system, compared to a SMP mode on the same hardware. You likely won’t
>> notice it on light workloads, as the interconnects (QPI) are very fast and
>> there’s a lot of bandwidth, but for stuff like big OLAP databases or other
>> data-manipulation workloads there’s a huge difference. And with CEPH being
>> CPU hungy and memory intensive, we’re seeing some big gains here just by
>> co-locating the memory with the processes….
>
> Could you elaborate a it on this?  I'm interested to learn in what situation
> memory locality helps Ceph to what extend.
>>
>>
>>
>> Jan
>>
>>
>>
>> On 30 Jun 2015, at 08:12, Ray Sun <xiaoquqi@xxxxxxxxx> wrote:
>>
>> Sound great, any update please let me know.
>>
>> Best Regards
>> -- Ray
>>
>> On Tue, Jun 30, 2015 at 1:46 AM, Jan Schermer <jan@xxxxxxxxxxx> wrote:
>>>
>>> I promised you all our scripts for automatic cgroup assignment - they are
>>> in our production already and I just need to put them on github, stay tuned
>>> tomorrow :-)
>>>
>>> Jan
>>>
>>>
>>> On 29 Jun 2015, at 19:41, Somnath Roy <Somnath.Roy@xxxxxxxxxxx> wrote:
>>>
>>> Presently, you have to do it by using tool like ‘taskset’ or ‘numactl’…
>>>
>>> Thanks & Regards
>>> Somnath
>>>
>>> From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf Of
>>> Ray Sun
>>> Sent: Monday, June 29, 2015 9:19 AM
>>> To: ceph-users@xxxxxxxxxxxxxx
>>> Subject:  How to use cgroup to bind ceph-osd to a specific
>>> cpu core?
>>>
>>> Cephers,
>>> I want to bind each of my ceph-osd to a specific cpu core, but I didn't
>>> find any document to explain that, could any one can provide me some
>>> detailed information. Thanks.
>>>
>>> Currently, my ceph is running like this:
>>>
>>> oot      28692      1  0 Jun23 ?        00:37:26 /usr/bin/ceph-mon -i
>>> seed.econe.com --pid-file /var/run/ceph/mon.seed.econe.com.pid -c
>>> /etc/ceph/ceph.conf --cluster ceph
>>> root      40063      1  1 Jun23 ?        02:13:31 /usr/bin/ceph-osd -i 0
>>> --pid-file /var/run/ceph/osd.0.pid -c /etc/ceph/ceph.conf --cluster ceph
>>> root      42096      1  0 Jun23 ?        01:33:42 /usr/bin/ceph-osd -i 1
>>> --pid-file /var/run/ceph/osd.1.pid -c /etc/ceph/ceph.conf --cluster ceph
>>> root      43263      1  0 Jun23 ?        01:22:59 /usr/bin/ceph-osd -i 2
>>> --pid-file /var/run/ceph/osd.2.pid -c /etc/ceph/ceph.conf --cluster ceph
>>> root      44527      1  0 Jun23 ?        01:16:53 /usr/bin/ceph-osd -i 3
>>> --pid-file /var/run/ceph/osd.3.pid -c /etc/ceph/ceph.conf --cluster ceph
>>> root      45863      1  0 Jun23 ?        01:25:18 /usr/bin/ceph-osd -i 4
>>> --pid-file /var/run/ceph/osd.4.pid -c /etc/ceph/ceph.conf --cluster ceph
>>> root      47462      1  0 Jun23 ?        01:20:36 /usr/bin/ceph-osd -i 5
>>> --pid-file /var/run/ceph/osd.5.pid -c /etc/ceph/ceph.conf --cluster ceph
>>>
>>> Best Regards
>>> -- Ray
>>>
>>> ________________________________
>>>
>>> PLEASE NOTE: The information contained in this electronic mail message is
>>> intended only for the use of the designated recipient(s) named above. If the
>>> reader of this message is not the intended recipient, you are hereby
>>> notified that you have received this message in error and that any review,
>>> dissemination, distribution, or copying of this message is strictly
>>> prohibited. If you have received this communication in error, please notify
>>> the sender by telephone or e-mail (as shown above) immediately and destroy
>>> any and all copies of this message in your possession (whether hard copies
>>> or electronically stored copies).
>>>
>>> _______________________________________________
>>> ceph-users mailing list
>>> ceph-users@xxxxxxxxxxxxxx
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>>
>>
>>
>>
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
>
>
> --
> Regards
> Huang Zhiteng
>
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com