Re: OSD startup causing slow requests - one tip from me

Haomai Wang <haomaiwang@xxxxxxxxx> · Fri, 31 Jul 2015 23:28:24 +0800

On Fri, Jul 31, 2015 at 5:47 PM, Jan Schermer <jan@xxxxxxxxxxx> wrote:
> I know a few other people here were battling with the occasional issue of OSD being extremely slow when starting.
>
> I personally run OSDs mixed with KVM guests on the same nodes, and was baffled by this issue occuring mostly on the most idle (empty) machines.
> Thought it was some kind of race condition where OSD started too fast and disks couldn’t catch up, was investigating latency of CPUs and cards on a mostly idle hardware etc. - with no improvement.
>
> But in the end, most of my issues were caused by page cache using too much memory. This doesn’t cause any problems when the OSDs have their memory allocated and are running, but when the OSD is (re)started, OS struggles to allocate contiguous blocks of memory for it and its buffers.
> This could also be why I’m seeing such an improvement with my NUMA pinning script - cleaning memory on one node is probably easier and doesn’t block allocations on other nodes.
>

Although this is make sense to me. It still let me shocked by the fact
that pagecache free or memory fragmentation will cause slow request!

> How can you tell if this is your case? When restarting an OSD that has this issue, look for CPU usage of “kswapd” processes. If it is >0 then you have this issue and would benefit from setting this:
>
> for i in $(mount |grep "ceph/osd" |cut -d' ' -f1 |cut -d'/' -f3 |tr -d '[0-9]') ; do echo 1 >/sys/block/$i/bdi/max_ratio ; done
> (another option is echo 1 > drop_caches before starting the OSD, but that’s a bit brutal)
>
> What this does is it limits the pagecache size for each block device to 1% of physical memory. I’d like to limit it even further but it doesn’t understand “0.3”...
>
> Let me know if it helps, I’ve not been able to test if this cures the problem completely, but there was no regression after setting it.
>
> Jan
>
> P.S. This is for RHEL 6 / CentOS 6 ancient 2.6.32 kernel, newer kernels have tunables to limit the overall pagecache size. You can also set the limits in cgroups but I’m afraid that won’t help in this case as you can only set the whole memory footprint limit where it will battle for allocations anyway.
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

-- 
Best Regards,

Wheat
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com