Reduced productivity because of slow requests

Grigory Murashov <murashov@xxxxxxxxxxxxxx> · Wed, 6 Jun 2018 14:57:46 +0300

Hello cephers!

I have luminous 12.2.5 cluster of 3 nodes 5 OSDs each with S3 RGW. All 
OSDs are HDD.

I often (about twice a day) have slow request problem which reduces 
cluster efficiency. It can be started both in day peak and night time. 
Doesn't matter.

That's what I have in ceph health detail 
https://avatars.mds.yandex.net/get-pdb/234183/9ba023d0-4352-4235-8826-76b412016e9f/s1200

Top and iostat results on osd.21's node
https://avatars.mds.yandex.net/get-pdb/51720/52ef79c1-eb1a-450a-8c95-675077045b84/s1200

https://avatars.mds.yandex.net/get-pdb/51720/0d98131c-82d3-4274-a406-743490e1f966/s1200

In fact in reduces cluster's io operations for about an half an hour 
twice a day
https://avatars.mds.yandex.net/get-pdb/222681/bed8f638-f259-403e-83cb-c7bfb30f14f1/s1200

That's normal io while status is OK
https://avatars.mds.yandex.net/get-pdb/245485/33ee3a53-083a-4656-b585-8df0007db2e2/s1200

That's how it affects on incoming traffic to RGW 
https://avatars.mds.yandex.net/get-pdb/51720/5a486d30-0d44-46f0-8f0f-668a05947bc8/s1200

Since it starts in any time but twice a day and for fixed period of time 
I assume it could be some recovery or rebalancing operations.

I tried to find smth out in osd logs but there are nothing about it.

Any thoughts how to avoid it?

Appreciate your help.

--
Grigory Murashov

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com