Re: Unexpected issues with simulated 'rack' outage

Romero Junior <r.junior@xxxxxxxxxxxxxxxxxxx> · Wed, 24 Jun 2015 13:13:44 +0000

Here it is:

http://pastebin.com/HfUPDTK4

Someone asked:

I am still begineer with Ceph, but as far as I understood, ceph is not designed to lose the 33% of the cluster at once and recover rapidly. What I understand is that you are losing 33% of
 the cluster losing 1 rack out of 3. It will take a very long time to recover, before you have HEALTH_OK status.
can you check with ceph -w how long it takes for ceph to converge to a healthy cluster after you switch off the switch in Rack-A ?

If I have a replica of each object in the other remaining racks (due to the crush map thingy), why should this impact my platform?

From: Andrey Korolyov [mailto:andrey@xxxxxxx]

Sent: woensdag, 24 juni, 2015 14:49

To: Romero Junior

Cc: ceph-users@xxxxxxxxxxxxxx

Subject: Re: [ceph-users] Unexpected issues with simulated 'rack' outage

> The question is: is this behavior indeed expected?

The answer can be positive if you are using large number of placement groups, 16k is indeed a large one. The peering may take a long time, blocking I/O requests effectively during this period. Do you have a ceph -w log
 during this transition to share?

Kind regards,

Romero Junior

Hosting Engineer

LeaseWeb Global Services B.V.

T:
+31 20 316 0230

M:
+31 6 2115 9310

E:
r.junior@xxxxxxxxxxxxxxxxxxx

W:
www.leaseweb.com

Luttenbergweg 8, 
1101 EC Amsterdam, 
Netherlands

LeaseWeb is the brand name under which the various independent LeaseWeb companies operate. Each company is a separate and distinct entity that provides services in a particular geographic area. LeaseWeb Global Services
 B.V. does not provide third-party services. Please see www.leaseweb.com/en/legal for more information.

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com