Re: Different recovery times for OSDs joining and leaving the cluster

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



When you lose 2 osds you have 30 osds accepting the degraded data and performing the backfilling. When the 2 osds are added back in you only have 2 osds receiving the majority of the data from the backfilling.  2 osds have a lot less available iops and spindle speed than the other 30 did when they were recovering from the loss causing your bottleneck.

Adding osds is generally a slower operation than losing them due to this.  Even for brand-new nodes increasing your cluster size.


On Wed, Sep 27, 2017, 8:43 AM Jonas Jaszkowic <jonasjaszkowic.work@xxxxxxxxx> wrote:
Hello all, 

I have setup a Ceph cluster consisting of one monitor, 32 OSD hosts (1 OSD of size 320GB per host) and 16 clients which are reading
and writing to the cluster. I have one erasure coded pool (shec plugin) with k=8, m=4, c=3 and pg_num=256. Failure domain is host.
I am able to reach a HEALTH_OK state and everything is working as expected. The pool was populated with
114048 files of different sizes ranging from 1kB to 4GB. Total amount of data in the pool was around 3TB. The capacity of the
pool was around 10TB.

I want to evaluate how Ceph is rebalancing data when 

1) I take out two OSDs and 
2) when I rejoin these two OSDS.

For scenario 1) I am „killing" two OSDs via ceph osd out <osd-id>. Ceph notices this failure and starts to rebalance data until I 
reach HEALTH_OK again.

For scenario 2) I am rejoining the previously killed OSDs via ceph osd in <osd-id>. Again, Ceph notices this failure and starts to 
rebalance data until HEALTH_OK state.

I repeated this whole scenario four times. What I am noticing is that the rebalancing process in the event of two OSDs joining the
cluster takes more than 3 times longer than in the event of the loss of two OSDs. This was consistent over the four runs.

I expected both recovering times to be equally long since at both scenarios the number of degraded objects was around 8% and the
number of missing objects around 2%. I attached a visualization of the recovery process in terms of degraded and missing objects, 
first part is the scenario where two OSDs „failed“, second one is the rejoining of these two OSDs. Note how it takes significantly longer
to recover in the second case.

Now I want to understand why it takes longer! I appreciate all hints.

Thanks!
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux