Uneven pg distribution cause high fs_apply_latency on osds with more pgs

"shadow_lin"<shadow_lin@xxxxxxx> · Wed, 7 Mar 2018 21:19:28 +0800

Hi list,
       Ceph version is jewel 10.2.10 and all 
osd are using filestore.
The Cluster has 96 osds and 1 
pool with size=2 replication with 4096 pg(base on pg calculate 
method from ceph doc for 100pg/per osd).
The osd with the most pg count has 104 PGs and 
there are 6 osds have above 100 PGs
Most of the osd have around 7x-9x PGs
The osd with the least pg count has 58 PGs

During the write test some of the osds have very 
high fs_apply_latency like 1000ms-4000ms while the normal ones are like 
100-600ms. The osds with high latency are always the ones with more pg on 
it.

iostat on the high latency osd shows the hdds 
are having high %util at about 95%-96% while the normal ones are having 
%util at 40%-60%

I think the reason to cause this is because the 
osds have more pgs need to handle more write request to it.Is this right?
But even though the pg distribution is not even 
but the variation is not that much.How could the performance be so sensitive to 
it?

Is there anything I can do to improve the 
performance and reduce the latency?

How can I make the pg distribution to be more 
even?

Thanks

2018-03-07

shadowlin

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com