Re: SSD Primary Affinity

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Richard, that would be simple to require that an SSD am HDD on the same host are not used.  Set your failure domain to Rack instead of Host and put the SSD host and HDD host in the same rack.  Ceph has no way of actually telling what a rack is, so it's whatever you set it to in your crush map.


On Thu, Apr 20, 2017, 5:26 AM Richard Hesketh <richard.hesketh@xxxxxxxxxxxx> wrote:
On 19/04/17 21:08, Reed Dier wrote:
> Hi Maxime,
>
> This is a very interesting concept. Instead of the primary affinity being used to choose SSD for primary copy, you set crush rule to first choose an osd in the ‘ssd-root’, then the ‘hdd-root’ for the second set.
>
> And with 'step chooseleaf first {num}’
>> If {num} > 0 && < pool-num-replicas, choose that many buckets.
> So 1 chooses that bucket
>> If {num} < 0, it means pool-num-replicas - {num}
> And -1 means it will fill remaining replicas on this bucket.
>
> This is a very interesting concept, one I had not considered.
> Really appreciate this feedback.
>
> Thanks,
>
> Reed
>
>> On Apr 19, 2017, at 12:15 PM, Maxime Guyot <Maxime.Guyot@xxxxxxxxx> wrote:
>>
>> Hi,
>>
>>>> Assuming production level, we would keep a pretty close 1:2 SSD:HDD ratio,
>>> 1:4-5 is common but depends on your needs and the devices in question, ie. assuming LFF drives and that you aren’t using crummy journals.
>>
>> You might be speaking about different ratios here. I think that Anthony is speaking about journal/OSD and Reed speaking about capacity ratio between and HDD and SSD tier/root.
>>
>> I have been experimenting with hybrid setups (1 copy on SSD + 2 copies on HDD), like Richard says you’ll get much better random read performance with primary OSD on SSD but write performance won’t be amazing since you still have 2 HDD copies to write before ACK.
>>
>> I know the doc suggests using primary affinity but since it’s a OSD level setting it does not play well with other storage tiers so I searched for other options. From what I have tested, a rule that selects the first/primary OSD from the ssd-root then the rest of the copies from the hdd-root works. Though I am not sure it is *guaranteed* that the first OSD selected will be primary.
>>
>> “rule hybrid {
>>  ruleset 2
>>  type replicated
>>  min_size 1
>>  max_size 10
>>  step take ssd-root
>>  step chooseleaf firstn 1 type host
>>  step emit
>>  step take hdd-root
>>  step chooseleaf firstn -1 type host
>>  step emit
>> }”
>>
>> Cheers,
>> Maxime

FWIW splitting my HDDs and SSDs into two separate roots and using a crush rule to first choose a host from the SSD root and take remaining replicas on the HDD root was the way I did it, too. By inspection, it did seem that all PGs in the pool had an SSD for a primary, so I think this is a reliable way of doing it. You would of course end up with an acting primary on one of the slow spinners for a brief period if you lost an SSD for whatever reason and it needed to rebalance.

The only downside is that if you have your SSD and HDD OSDs on the same physical hosts I'm not sure how you set up your failure domains and rules to make sure that you don't take an SSD primary and HDD replica on the same host. In my case, SSDs and HDDs are on different hosts, so it didn't matter to me.
--
Richard Hesketh

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux