Hello All,
I´m writing to you because i´m trying to find the way to rebuild a osd disk in a way to don´t impact the performance of the cluster.
That´s because my applications are very latency sensitive.
1_ I found the way to reuse a OSD ID and don´t rebalance the cluster every time that I lost a disk.
So, my cluster is running with the noout check forever.
The point here is do the disk change as fast I can.
2_ after reuse de OSD ID, I´m living the OSD up and running, but with CERO weight.
For example:
root@DC4-ceph03-dn03:/var/lib/ceph/osd/ceph-352# ceph osd tree | grep 352
352 1.81999 osd.352 up 0 1.00000
At this point everything is good.
3_ Starting the reweight, using "osd reweigh" i´m not touching the crushmap, and I´m doing the reweight very gradually.
Example:
ceph osd reweight 352 0.001
But, anyway doing the reweight in this way i´m heating the latency sometimes.
Depending of the amount of PGs that the cluster is recovering the impact is worst.
Tunings that I already have done:
ceph tell osd.* injectargs "--osd_max_backfills 1"
ceph tell osd.* injectargs "--osd_recovery_max_active 1"
ceph tell osd.* injectargs '--osd-max-recovery-threads 1'
ceph tell osd.* injectargs '--osd-recovery-op-priority 1'
ceph tell osd.* injectargs '--osd-client-op-priority 63'
The question is, there are more parameters to change in order to do more gradually the OSD rebuild?
I really appreciate your help, thanks in advance.
Agustin Trolli
Storage Team
Mercadolibre.com
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com