Re: SSD-primary crush rule doesn't work as intended

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I won't run out of write iops when I have ssd journal in place. I know that I can use the dual root method from Sebastien's web site, but I thought the 'storage class' feature is the way to solve this kind of problem.

https://www.sebastien-han.fr/blog/2014/08/25/ceph-mix-sata-and-ssd-within-the-same-box/

Regards,
Horace Ng


From: "Peter Linder" <peter.linder@xxxxxxxxxxxxxx>
To: "Paul Emmerich" <paul.emmerich@xxxxxxxx>, "horace" <horace@xxxxxxxxx>
Cc: "ceph-users" <ceph-users@xxxxxxxxxxxxxx>
Sent: Thursday, May 24, 2018 3:46:59 PM
Subject: Re: SSD-primary crush rule doesn't work as intended

It will also only work reliably if you use a single level tree structure with failure domain "host". If you want say, separate data center failure domains, you need extra steps to make sure a SSD host and a HDD host do not get selected from the same DC.

I have done such a layout so it is possible (see my older posts) but you need to be careful when you construct the additional trees that are needed in order to force the correct elections. 

In reality however, even if you force all reads to the SSD using primary affinity, you will soon run out of write IOPS on the HDDs. To keep up with the SSD's you will need so many HDDs for an average workload that in order to keep up performance you will not save any money.

Regards,

Peter



Den 2018-05-23 kl. 14:37, skrev Paul Emmerich:
You can't mix HDDs and SSDs in a server if you want to use such a rule.
The new selection step after "emit" can't know what server was selected previously.

Paul

2018-05-23 11:02 GMT+02:00 Horace <horace@xxxxxxxxx>:
Add to the info, I have a slightly modified rule to take advantage of the new storage class.

rule ssd-hybrid {
        id 2
        type replicated
        min_size 1
        max_size 10
        step take default class ssd
        step chooseleaf firstn 1 type host
        step emit
        step take default class hdd
        step chooseleaf firstn -1 type host
        step emit
}

Regards,
Horace Ng

----- Original Message -----
From: "horace" <horace@xxxxxxxxx>
To: "ceph-users" <ceph-users@xxxxxxxxxxxxxx>
Sent: Wednesday, May 23, 2018 3:56:20 PM
Subject: SSD-primary crush rule doesn't work as intended

I've set up the rule according to the doc, but some of the PGs are still being assigned to the same host.

http://docs.ceph.com/docs/master/rados/operations/crush-map-edits/

  rule ssd-primary {
              ruleset 5
              type replicated
              min_size 5
              max_size 10
              step take ssd
              step chooseleaf firstn 1 type host
              step emit
              step take platter
              step chooseleaf firstn -1 type host
              step emit
      }

Crush tree:

[root@ceph0 ~]#    ceph osd crush tree
ID CLASS WEIGHT   TYPE NAME     
-1       58.63989 root default   
-2       19.55095     host ceph0
 0   hdd  2.73000         osd.0 
 1   hdd  2.73000         osd.1 
 2   hdd  2.73000         osd.2 
 3   hdd  2.73000         osd.3 
12   hdd  4.54999         osd.12
15   hdd  3.71999         osd.15
18   ssd  0.20000         osd.18
19   ssd  0.16100         osd.19
-3       19.55095     host ceph1
 4   hdd  2.73000         osd.4 
 5   hdd  2.73000         osd.5 
 6   hdd  2.73000         osd.6 
 7   hdd  2.73000         osd.7 
13   hdd  4.54999         osd.13
16   hdd  3.71999         osd.16
20   ssd  0.16100         osd.20
21   ssd  0.20000         osd.21
-4       19.53799     host ceph2
 8   hdd  2.73000         osd.8 
 9   hdd  2.73000         osd.9 
10   hdd  2.73000         osd.10
11   hdd  2.73000         osd.11
14   hdd  3.71999         osd.14
17   hdd  4.54999         osd.17
22   ssd  0.18700         osd.22
23   ssd  0.16100         osd.23

#ceph pg ls-by-pool ssd-hybrid

27.8       1051                  0        0         0       0 4399733760 1581     1581               active+clean 2018-05-23 06:20:56.088216 27957'185553 27959:368828  [23,1,11]         23  [23,1,11]             23 27953'182582 2018-05-23 06:20:56.088172    27843'162478 2018-05-20 18:28:20.118632

With osd.23 and osd.11 being assigned on the same host.

Regards,
Horace Ng
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



--
--
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90


_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux