Hello all, I have setup a Ceph cluster consisting of one monitor, 32 OSD hosts (1 OSD of size 320GB per host) and 16 clients which are reading and writing to the cluster. I have one erasure coded pool (shec plugin) with k=8, m=4, c=3 and pg_num=256. Failure domain is host. I am able to reach a HEALTH_OK state and everything is working as expected. The pool was populated with 114048 files of different sizes ranging from 1kB to 4GB. Total amount of data in the pool was around 3TB. The capacity of the pool was around 10TB. I want to evaluate how Ceph is rebalancing data when 1) I take out two OSDs and 2) when I rejoin these two OSDS. For scenario 1) I am „killing" two OSDs via ceph osd out <osd-id>. Ceph notices this failure and starts to rebalance data until I reach HEALTH_OK again. For scenario 2) I am rejoining the previously killed OSDs via ceph osd in <osd-id>. Again, Ceph notices this failure and starts to rebalance data until HEALTH_OK state. I repeated this whole scenario four times. What I am noticing is that the rebalancing process in the event of two OSDs joining the cluster takes more than 3 times longer than in the event of the loss of two OSDs. This was consistent over the four runs. I expected both recovering times to be equally long since at both scenarios the number of degraded objects was around 8% and the number of missing objects around 2%. I attached a visualization of the recovery process in terms of degraded and missing objects, first part is the scenario where two OSDs „failed“, second one is the rejoining of these two OSDs. Note how it takes significantly longer to recover in the second case. Now I want to understand why it takes longer! I appreciate all hints. Thanks! |
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com