One problem that I can see with this setup is that you will fill up the SSDs holding the primary replica before the HDD ones, if they are much different in size. Other than that, it's a very inventive solution to increase read speeds without using a possibly buggy cache configuration. > On Apr 20, 2017, at 05:25, Richard Hesketh <richard.hesketh@xxxxxxxxxxxx> wrote: > >> On 19/04/17 21:08, Reed Dier wrote: >> Hi Maxime, >> >> This is a very interesting concept. Instead of the primary affinity being used to choose SSD for primary copy, you set crush rule to first choose an osd in the ‘ssd-root’, then the ‘hdd-root’ for the second set. >> >> And with 'step chooseleaf first {num}’ >>> If {num} > 0 && < pool-num-replicas, choose that many buckets. >> So 1 chooses that bucket >>> If {num} < 0, it means pool-num-replicas - {num} >> And -1 means it will fill remaining replicas on this bucket. >> >> This is a very interesting concept, one I had not considered. >> Really appreciate this feedback. >> >> Thanks, >> >> Reed >> >>> On Apr 19, 2017, at 12:15 PM, Maxime Guyot <Maxime.Guyot@xxxxxxxxx> wrote: >>> >>> Hi, >>> >>>>> Assuming production level, we would keep a pretty close 1:2 SSD:HDD ratio, >>>> 1:4-5 is common but depends on your needs and the devices in question, ie. assuming LFF drives and that you aren’t using crummy journals. >>> >>> You might be speaking about different ratios here. I think that Anthony is speaking about journal/OSD and Reed speaking about capacity ratio between and HDD and SSD tier/root. >>> >>> I have been experimenting with hybrid setups (1 copy on SSD + 2 copies on HDD), like Richard says you’ll get much better random read performance with primary OSD on SSD but write performance won’t be amazing since you still have 2 HDD copies to write before ACK. >>> >>> I know the doc suggests using primary affinity but since it’s a OSD level setting it does not play well with other storage tiers so I searched for other options. From what I have tested, a rule that selects the first/primary OSD from the ssd-root then the rest of the copies from the hdd-root works. Though I am not sure it is *guaranteed* that the first OSD selected will be primary. >>> >>> “rule hybrid { >>> ruleset 2 >>> type replicated >>> min_size 1 >>> max_size 10 >>> step take ssd-root >>> step chooseleaf firstn 1 type host >>> step emit >>> step take hdd-root >>> step chooseleaf firstn -1 type host >>> step emit >>> }” >>> >>> Cheers, >>> Maxime > > FWIW splitting my HDDs and SSDs into two separate roots and using a crush rule to first choose a host from the SSD root and take remaining replicas on the HDD root was the way I did it, too. By inspection, it did seem that all PGs in the pool had an SSD for a primary, so I think this is a reliable way of doing it. You would of course end up with an acting primary on one of the slow spinners for a brief period if you lost an SSD for whatever reason and it needed to rebalance. > > The only downside is that if you have your SSD and HDD OSDs on the same physical hosts I'm not sure how you set up your failure domains and rules to make sure that you don't take an SSD primary and HDD replica on the same host. In my case, SSDs and HDDs are on different hosts, so it didn't matter to me. > -- > Richard Hesketh > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com