osd_recovery_delay_start - is the delay in seconds between iterations recovery (osd_recovery_max_active)
2015-03-03 14:27 GMT+03:00 Andrija Panic <andrija.panic@xxxxxxxxx>:
Another question - I mentioned here 37% of objects being moved arround - this is MISPLACED object (degraded objects were 0.001%, after I removed 1 OSD from cursh map (out of 44 OSD or so).Can anybody confirm this is normal behaviour - and are there any workarrounds ?I understand this is because of the object placement algorithm of CEPH, but still 37% of object missplaces just by removing 1 OSD from crush maps out of 44 make me wonder why this large percentage ?Seems not good to me, and I have to remove another 7 OSDs (we are demoting some old hardware nodes). This means I can potentialy go with 7 x the same number of missplaced objects...?Any thoughts ?Thanks--On 3 March 2015 at 12:14, Andrija Panic <andrija.panic@xxxxxxxxx> wrote:Thanks Irek.Does this mean, that after peering for each PG, there will be delay of 10sec, meaning that every once in a while, I will have 10sec od the cluster NOT being stressed/overloaded, and then the recovery takes place for that PG, and then another 10sec cluster is fine, and then stressed again ?I'm trying to understand process before actually doing stuff (config reference is there on ceph.com but I don't fully understand the process)Thanks,Andrija--On 3 March 2015 at 11:32, Irek Fasikhov <malmyzh@xxxxxxxxx> wrote:Hi.Use value "osd_recovery_delay_start"example:[root@ceph08 ceph]# ceph --admin-daemon /var/run/ceph/ceph-osd.94.asok config show | grep osd_recovery_delay_start"osd_recovery_delay_start": "10"2015-03-03 13:13 GMT+03:00 Andrija Panic <andrija.panic@xxxxxxxxx>:_______________________________________________HI Guys,I yesterday removed 1 OSD from cluster (out of 42 OSDs), and it caused over 37% od the data to rebalance - let's say this is fine (this is when I removed it frm Crush Map).I'm wondering - I have previously set some throtling mechanism, but during first 1h of rebalancing, my rate of recovery was going up to 1500 MB/s - and VMs were unusable completely, and then last 4h of the duration of recover this recovery rate went down to, say, 100-200 MB.s and during this VM performance was still pretty impacted, but at least I could work more or a lessSo my question, is this behaviour expected, is throtling here working as expected, since first 1h was almoust no throtling applied if I check the recovery rate 1500MB/s and the impact on Vms.And last 4h seemed pretty fine (although still lot of impact in general)I changed these throtling on the fly with:ceph tell osd.* injectargs '--osd_recovery_max_active 1'ceph tell osd.* injectargs '--osd_recovery_op_priority 1'ceph tell osd.* injectargs '--osd_max_backfills 1'My Jorunals are on SSDs (12 OSD per server, of which 6 journals on one SSD, 6 journals on another SSD) - I have 3 of these hosts.Any thought are welcome.--Andrija Panić
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
--С уважением, Фасихов Ирек НургаязовичМоб.: +79229045757Andrija PanićAndrija Panić
С уважением, Фасихов Ирек Нургаязович
Моб.: +79229045757
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com