[URGENT-HELP] - Ceph rebalancing again after taking OSD out of CRUSH map

Andrija Panic <andrija.panic@xxxxxxxxx> · Mon, 2 Mar 2015 15:56:38 +0100

Hi people,
I had one OSD crash, so the rebalancing happened - all fine (some 3% of the data has been moved arround, and rebalanced) and my previous recovery/backfill throtling was applied fine and we didnt have a unusable cluster.

Now I used the procedure to remove this crashed OSD comletely from the CEPH (http://ceph.com/docs/master/rados/operations/add-or-rm-osds/#removing-the-osd)

and when I used the "ceph osd crush remove osd.0" command, all of a sudden, CEPH started to rebalance once again, this time with 37% of the object that are "missplaced" and based on the eperience inside VMs, and the Recovery RAte in MB/s - I can tell that my throtling of backfilling and recovery is not taken into consideration.

Why is this, 37% of all objects again being moved arround, any help, hint, explanation greatly appreciated.

This is CEPH 0.87.0 from CEPH repo of course. 42 OSD total after the crash etc.

The throtling that I have applied from before is like folowing:

ceph tell osd.* injectargs '--osd_recovery_max_active 1'
ceph tell osd.* injectargs '--osd_recovery_op_priority 1'
ceph tell osd.* injectargs '--osd_max_backfills 1'

Please advise...
Thanks

-- 

Andrija Panić

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com