Re: latency when OSD falls out of cluster

Mikaël Cluseau <mcluseau@xxxxxx> · Sun, 21 Jul 2013 09:04:03 +1100

Hi,

On 07/12/13 19:57, Edwin Peer wrote:
Seconds of down time is quite severe, especially when it is a planned 
shut down or rejoining. I can understand if an OSD just disappears, 
that some requests might be directed to the now gone node, but I see 
similar latency hiccups on scheduled shut downs and rejoins too?

have you tried to reduce the "osd recovery max active" and "osd backfill 
max" from the defaults ? There are also some option to reduce the 
recovery's priority.

The defaults say 63 prio for client and 10 for recovery. But also a "osd 
recovery max active" to 5 and 2 I/O threads per OSD. I notice that 
reducing the "osd recovery max active" to 1 reduced the I/O latency 
penalty when recovery is active.

Reweighting an OSD to 0.9 should be enough to let you see how your 
cluster performs under recovery.
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com