slow osd problem

Aleksei Gutikov <aleksey.gutikov@xxxxxxxxxx> · Wed, 27 Dec 2017 20:57:26 +0300

Hi all.

In general - one single slow osd significantly affects the whole cluster
of rbd clients.

If just one osd in rbd pool has significantly increased latency, for 
example 30ms, while others are 0-1ms.
For any reason, not crashes but just slowed down.
Than every rbd client (application) will periodically  get fsync (or 
direct operation) latency over 10 seconds and higher.

As I understand osd latency is somehow multiplied by length of some 
abstract queue from application to osd and for application
write latency become, for example, over 10s.

Average period is something like: (number of osds)*(client write period) 
/ (number of writes into different rbd objects)
So if pool contains 100 osds, and app touches 2 rbd objs every second - 
than every minute write will goes to slow osd and app may stuck.

As I understand it is general issue caused by random data distribution.
But is there any way to handle this case, while client is really 
cluester-aware?
Any timeouts for rbd clients after which it switches to secondary osd in 
pg, or even send duplicated msgs from client to several osds in pg same 
time. Maybe some applications can prefer x2 network usage with lower 
latency for example.

B.t.w while rbd client is not updating osd map itselft, only getting it 
from mon, i'm afraid to think what will happen if mon will have 
connection to some osd and rbd client for some reason will not.

--

Best regards,
Aleksei Gutikov
Software Engineer | synesis.ru | Minsk. BY
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com