On 04/20/2015 04:18 PM, Robert LeBlanc wrote: > You usually won't end up with more than the "size" number of replicas, even in a failure situation. Although technically more than "size" number of OSDs may have the data (if the OSD comes back in service, the journal may be used to quickly get the OSD back up to speed), these would not be active. > > For us using size 4 and min size 2 is so that we can lose an entire rack (2 copies) but not block I/O. Our configuration prevents four copies in one rack. If we lose a rack and then an OSD in the surviving rack, write I/O to those placement groups groups will block until the objects have been replicated elsewhere in the rack, but it would not be more than 2 copies. > > I hope I'm making sense and this my jabbering is useful. Yes, it is helpful, thank you. My clarity level has been upgraded from mud to stained glass. If I am following the logic of your rule correctly: 1. If we have less than 2 replicas per rack, run this step: step choose firstn 2 type rack 2. If we have less than 2 replicas on our hosts in this rack, run this step: step chooseleaf firstn 2 type host I still don't understand where exactly max_size comes into play, unless you have some elaborate chain of rules, like mixing platter and ssd drives in the same pool. The documented example for this scenario is the only one I have found that utilizes the max_size in a meaningful way. Anyway, thanks for your help in translating from CRUSH to English. _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com