On Tue, 13 Jan 2015, Christopher Kunz wrote: > Hi, > > > Okay, it sounds like something is not quite right then. Can you attach > > the OSDMap once it is in the not-quite-repaired state? And/or try > > setting 'ceph osd crush tunables optimal' and see if that has any > > effect? > > > Indeed it did - I set ceph osd crush tunables optimal (80% degradation) > and unplugged one sled. After manually setting the OSDs down and out, > the cluster degraded to over 80% again and recovered within a couple > minutes (I only have 14K objects there). > > So I probably set something to a very wrong value or the constant > switching between replica size 2 and 3 confused the cluster? > > > Cute! That kind of looks like 3 sleds of 7 in one chassis though? Or am > > I looking at the wrong thing? > > > Yeah, but the "sled" failure domain is not existant in default CRUSH > maps. It seemed OKish to use "chassis" for the PoC. I might write a more > heavily customized CRUSH map after I figure out what I can productively > do with the cluster. :) The types are just names; we put the default ones in there that seemed tlike they would b ethemost common but we could easily add sled in (between host and chassis?) if that is something that is reasonably common... > I have one more issue that I'm trying to reproduce right now, but so far > the "tunables optimal" trick helped tremendously, thanks! Great! sage _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com