Re: SSD-primary crush rule doesn't work as intended

Horace <horace@xxxxxxxxx> · Mon, 4 Jun 2018 16:33:46 +0800 (HKT)

I won't run out of write iops when I have ssd journal in place. I know that I can use the dual root method from Sebastien's web site, but I thought the 'storage class' feature is the way to solve this kind of problem.

https://www.sebastien-han.fr/blog/2014/08/25/ceph-mix-sata-and-ssd-within-the-same-box/

Regards,
Horace Ng

From: "Peter Linder" <peter.linder@xxxxxxxxxxxxxx>
To: "Paul Emmerich" <paul.emmerich@xxxxxxxx>, "horace" <horace@xxxxxxxxx>
Cc: "ceph-users" <ceph-users@xxxxxxxxxxxxxx>
Sent: Thursday, May 24, 2018 3:46:59 PM
Subject: Re:  SSD-primary crush rule doesn't work as intended

It will also only work reliably if you use a single level tree
      structure with failure domain "host". If you want say, separate
      data center failure domains, you need extra steps to make sure a
      SSD host and a HDD host do not get selected from the same DC. 

    I have done such a layout so it is possible (see my older posts)
      but you need to be careful when you construct the additional trees
      that are needed in order to force the correct elections. 
    In reality however, even if you force all reads to the SSD using
      primary affinity, you will soon run out of write IOPS on the HDDs.
      To keep up with the SSD's you will need so many HDDs for an
      average workload that in order to keep up performance you will not
      save any money. 

    Regards,
    Peter

    Den 2018-05-23 kl. 14:37, skrev Paul
      Emmerich:

        You can't mix HDDs and SSDs in a server if you want to use
          such a rule.

        The new selection step after "emit" can't know what server
          was selected previously.

        Paul

        2018-05-23 11:02 GMT+02:00 Horace <horace@xxxxxxxxx>:

          Add to the
            info, I have a slightly modified rule to take advantage of
            the new storage class.

            rule ssd-hybrid {

                    id 2

                    type replicated

                    min_size 1

                    max_size 10

                    step take default class ssd

                    step chooseleaf firstn 1 type host

                      step emit

                    step take default class hdd

                    step chooseleaf firstn -1 type host

                      step emit

              }

            Regards,

            Horace Ng

                ----- Original Message -----

                From: "horace" <horace@xxxxxxxxx>

                To: "ceph-users" <ceph-users@xxxxxxxxxxxxxx>

                Sent: Wednesday, May 23, 2018 3:56:20 PM

                Subject:  SSD-primary crush rule doesn't
                work as intended

                I've set up the rule according to the doc, but some of
                the PGs are still being assigned to the same host.

                http://docs.ceph.com/docs/master/rados/operations/crush-map-edits/

                  rule ssd-primary {

                              ruleset 5

                              type replicated

                              min_size 5

                              max_size 10

                              step take ssd

                              step chooseleaf firstn 1 type host

                              step emit

                              step take platter

                              step chooseleaf firstn -1 type host

                              step emit

                      }

                Crush tree:

                [root@ceph0 ~]#    ceph osd crush tree

                ID CLASS WEIGHT   TYPE NAME      

                -1       58.63989 root default   

                -2       19.55095     host ceph0 

                 0   hdd  2.73000         osd.0  

                 1   hdd  2.73000         osd.1  

                 2   hdd  2.73000         osd.2  

                 3   hdd  2.73000         osd.3  

                12   hdd  4.54999         osd.12 

                15   hdd  3.71999         osd.15 

                18   ssd  0.20000         osd.18 

                19   ssd  0.16100         osd.19 

                -3       19.55095     host ceph1 

                 4   hdd  2.73000         osd.4  

                 5   hdd  2.73000         osd.5  

                 6   hdd  2.73000         osd.6  

                 7   hdd  2.73000         osd.7  

                13   hdd  4.54999         osd.13 

                16   hdd  3.71999         osd.16 

                20   ssd  0.16100         osd.20 

                21   ssd  0.20000         osd.21 

                -4       19.53799     host ceph2 

                 8   hdd  2.73000         osd.8  

                 9   hdd  2.73000         osd.9  

                10   hdd  2.73000         osd.10 

                11   hdd  2.73000         osd.11 

                14   hdd  3.71999         osd.14 

                17   hdd  4.54999         osd.17 

                22   ssd  0.18700         osd.22 

                23   ssd  0.16100         osd.23 

                #ceph pg ls-by-pool ssd-hybrid

                27.8       1051                  0        0         0   
                   0 4399733760 1581     1581               active+clean
                2018-05-23 06:20:56.088216 27957'185553 27959:368828 
                [23,1,11]         23  [23,1,11]             23
                27953'182582 2018-05-23 06:20:56.088172    27843'162478
                2018-05-20 18:28:20.118632 

                With osd.23 and osd.11 being assigned on the same host.

                Regards,

                Horace Ng

                _______________________________________________

                ceph-users mailing list

                ceph-users@xxxxxxxxxxxxxx

                http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

                _______________________________________________

                ceph-users mailing list

                ceph-users@xxxxxxxxxxxxxx

                http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

        -- 

              -- 

                Paul Emmerich

                Looking for help with your Ceph cluster? Contact us at https://croit.io

                croit GmbH

                Freseniusstr. 31h

                81247 München

                www.croit.io

                Tel: +49 89 1896585 90

      _______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com