Hi all.
In general - one single slow osd significantly affects the whole cluster
of rbd clients.
If just one osd in rbd pool has significantly increased latency, for
example 30ms, while others are 0-1ms.
For any reason, not crashes but just slowed down.
Than every rbd client (application) will periodically get fsync (or
direct operation) latency over 10 seconds and higher.
As I understand osd latency is somehow multiplied by length of some
abstract queue from application to osd and for application
write latency become, for example, over 10s.
Average period is something like: (number of osds)*(client write period)
/ (number of writes into different rbd objects)
So if pool contains 100 osds, and app touches 2 rbd objs every second -
than every minute write will goes to slow osd and app may stuck.
As I understand it is general issue caused by random data distribution.
But is there any way to handle this case, while client is really
cluester-aware?
Any timeouts for rbd clients after which it switches to secondary osd in
pg, or even send duplicated msgs from client to several osds in pg same
time. Maybe some applications can prefer x2 network usage with lower
latency for example.
B.t.w while rbd client is not updating osd map itselft, only getting it
from mon, i'm afraid to think what will happen if mon will have
connection to some osd and rbd client for some reason will not.
--
Best regards,
Aleksei Gutikov
Software Engineer | synesis.ru | Minsk. BY
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com