Hi, > Okay, it sounds like something is not quite right then. Can you attach > the OSDMap once it is in the not-quite-repaired state? And/or try > setting 'ceph osd crush tunables optimal' and see if that has any > effect? > Indeed it did - I set ceph osd crush tunables optimal (80% degradation) and unplugged one sled. After manually setting the OSDs down and out, the cluster degraded to over 80% again and recovered within a couple minutes (I only have 14K objects there). So I probably set something to a very wrong value or the constant switching between replica size 2 and 3 confused the cluster? > Cute! That kind of looks like 3 sleds of 7 in one chassis though? Or am > I looking at the wrong thing? > Yeah, but the "sled" failure domain is not existant in default CRUSH maps. It seemed OKish to use "chassis" for the PoC. I might write a more heavily customized CRUSH map after I figure out what I can productively do with the cluster. :) I have one more issue that I'm trying to reproduce right now, but so far the "tunables optimal" trick helped tremendously, thanks! Regards, --ck _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com