Re: Full OSD halting a cluster - isn't this violating the "no single point of failure" promise?

David <dclistslinux@xxxxxxxxx> · Mon, 19 Sep 2016 15:08:15 +0100

Ceph is pretty awesome but I'm not sure it can be expected to keep I/O going if there is no available capacity. Granted, the osds aren't always balanced evenly but generally if you've got one drive hitting full ratio, you've probably got a lot more not far behind. 
Although probably not recommend, it should be pretty easy to automate taking an OSD out of the cluster if it gets too full. Of course the best practice is to not let osds get past nearfull without taking action.

On 16 Sep 2016 19:36, "Christian Theune" <ct@xxxxxxxxxxxxxxx> wrote:
Hi,
(just in case: this isn’t intended as a rant and I hope it doesn’t get read at it. I’m trying to understand what some perspectives towards potential future improvements are and I think it would be valuable to have this discoverable in the archives)

We’ve had a “good" time recently balancing our growing cluster and did a lot of reweighting after a full OSD actually did bite us once. 

Apart from paying our dues (tight monitoring, reweighting and generally hedging the cluster) I was wondering whether this behaviour is a violation of the “no single point of failure” promise: independent of how big your setup grows, a single OSD can halt practically everything. Even just stopping the OSD would unblock your cluster (assuming that Crush made a particular pathological choice and that 1 OSD being extremely off the curve compared to the others) and keep going.

I haven’t found much whether this is “it’s the way it is and we don’t see a way forward” or whether this behaviour is considered something that could be improved in the future and whether there are strategies around already?

From my perspective this is directly related to how well Crush weighting works with respect to placing data evenly. (I would expect that in certain situations like a single RBD cluster where all objects are identically sized that this should be something that Crush can perform well in, but my last weeks tells me that isn’t the case. :) )

An especially interesting edge case is if your cluster consists of 2 pools where each runs using a completely disjoint set of OSDs: I guess it’s an accidental (not intentional) behaviour that the one pool would be affecting the other, right?

Thoughts?

Hugs,
Christian

-- 
Christian Theune · ct@xxxxxxxxxxxxxxx · +49 345 219401 0
Flying Circus Internet Operations GmbH · http://flyingcircus.io
Forsterstraße 29 · 06112 Halle (Saale) · Deutschland
HR Stendal HRB 21169 · Geschäftsführer: Christian. Theune, Christian. Zagrodnick

_______________________________________________

ceph-users mailing list

ceph-users@xxxxxxxxxxxxxx

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com