It seems that NUMA can be problematic for ceph-osd daemons in certain circumstances. Namely it seems that if a NUMA zone is running out of memory due to uneven allocation it is possible for a NUMA zone to enter reclaim mode when threads/processes are scheduled on a core in that zone and those processes are request memory allocations greater than the zones remaining memory. In order for the kernel to satisfy the memory allocation for those processes it needs to page out some of the contents of the contentious zone, which can have dramatic performance implications due to cache misses, etc. I see two ways an operator could alleviate these issues: Set the vm.zone_reclaim_mode sysctl setting to 0, along with prefixing ceph-osd daemons with "numactl --interleave=all". This should probably be activated by a flag in /etc/default/ceph and modifying the ceph-osd.conf upstart script, along with adding a depend to the ceph package's debian/rules file on the "numactl" package. The alternative is to use a cgroup for each ceph-osd daemon, pinning each one to cores in the same NUMA zone using cpuset.cpu and cpuset.mems. This would probably also live in /etc/default/ceph and the upstart scripts. -- Kyle _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com