Re: CRUSH question - failing to rebalance after failure test

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

> Okay, it sounds like something is not quite right then.  Can you attach 
> the OSDMap once it is in the not-quite-repaired state?  And/or try 
> setting 'ceph osd crush tunables optimal' and see if that has any 
> effect?
> 
Indeed it did - I set ceph osd crush tunables optimal (80% degradation)
 and unplugged one sled. After manually setting the OSDs down and out,
the cluster degraded to over 80% again and recovered within a couple
minutes (I only have 14K objects there).

So I probably set something to a very wrong value or the constant
switching between replica size 2 and 3 confused the cluster?

> Cute!  That kind of looks like 3 sleds of 7 in one chassis though?  Or am 
> I looking at the wrong thing?
> 
Yeah, but the "sled" failure domain is not existant in default CRUSH
maps. It seemed OKish to use "chassis" for the PoC. I might write a more
heavily customized CRUSH map after I figure out what I can productively
do with the cluster. :)

I have one more issue that I'm trying to reproduce right now, but so far
the "tunables optimal" trick helped tremendously, thanks!

Regards,

--ck

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux