Re: Ceph snapshots

Marc Schöchlin <ms@xxxxxxxxxx> · Fri, 29 Jun 2018 18:20:04 +0200



    It seems that this might interesting - unfortunately this cannot
      be changed dynamically:

    
    # ceph tell osd.* injectargs '--osd_snap_trim_sleep 0.025'

      osd.0: osd_snap_trim_sleep = '0.025000' (not observed, change may
      require restart) 

      osd.1: osd_snap_trim_sleep = '0.025000' (not observed, change may
      require restart) 

      osd.2: osd_snap_trim_sleep = '0.025000' (not observed, change may
      require restart) 

    
    Am 29.06.2018 um 17:36 schrieb Paul
      Emmerich:

    
        It's usually the snapshot deletion that triggers slowness.
          Are you also deleting/rotating old snapshots when creating new
          ones?
        

        In this case: try to increase osd_snap_trim_sleep a little
          bit. Even to 0.025 can help a lot with a lot of concurrent
          snapshot deletions.

        
        (That's what we set as default for exactly this reason -
          users see snapshot deletion as instant and cheap, but it can
          be quite expensive)

        
        Paul
        

        2018-06-29 17:28 GMT+02:00 Marc
          Schöchlin <ms@xxxxxxxxxx>:

          
              Hi Gregory,
              thanks for the link - very interesting talk.

                You mentioned the following settings in your talk, but i
                was not able to find some documentation in the osd
                config reference:

                (http://docs.ceph.com/docs/luminous/rados/configuration/osd-config-ref/)

                
                My clusters settings look like this (luminous/12.2.5):

              
              osd_snap_trim_cost = 1048576

                osd_snap_trim_priority = 5

                osd_snap_trim_sleep = 0.000000

                mon_osd_snap_trim_queue_warn_on = 32768

              
              I currently experience messages like this:

                
                2018-06-29 12:17:47.230028 mon.ceph-mon-s43 mon.0 10.23.27.153:6789/0
                  1534846 : cluster [INF] Health check cleared:
                  REQUEST_SLOW (was: 22 slow requests are blocked >
                  32 sec)

                2018-06-29 12:17:47.230069 mon.ceph-mon-s43
                  mon.0 10.23.27.153:6789/0
                  1534847 : cluster [INF] Cluster is now healthy

                2018-06-29 12:18:03.287947 mon.ceph-mon-s43
                  mon.0 10.23.27.153:6789/0
                  1534876 : cluster [WRN] Health check failed: 24 slow
                  requests are blocked > 32 sec (REQUEST_SLOW)

                2018-06-29 12:18:08.307626 mon.ceph-mon-s43
                  mon.0 10.23.27.153:6789/0
                  1534882 : cluster [WRN] Health check update: 70 slow
                  requests are blocked > 32 sec (REQUEST_SLOW)

                2018-06-29 12:18:14.325471 mon.ceph-mon-s43
                  mon.0 10.23.27.153:6789/0
                  1534889 : cluster [WRN] Health check update: 79 slow
                  requests are blocked > 32 sec (REQUEST_SLOW)

                2018-06-29 12:18:24.502586 mon.ceph-mon-s43
                  mon.0 10.23.27.153:6789/0
                  1534900 : cluster [WRN] Health check update: 84 slow
                  requests are blocked > 32 sec (REQUEST_SLOW)

                2018-06-29 12:18:34.489700 mon.ceph-mon-s43
                  mon.0 10.23.27.153:6789/0
                  1534911 : cluster [WRN] Health check update: 17 slow
                  requests are blocked > 32 sec (REQUEST_SLOW)

                2018-06-29 12:18:39.489982 mon.ceph-mon-s43
                  mon.0 10.23.27.153:6789/0
                  1534917 : cluster [WRN] Health check update: 19 slow
                  requests are blocked > 32 sec (REQUEST_SLOW)

                2018-06-29 12:18:44.490274 mon.ceph-mon-s43
                  mon.0 10.23.27.153:6789/0
                  1534923 : cluster [WRN] Health check update: 40 slow
                  requests are blocked > 32 sec (REQUEST_SLOW)

                2018-06-29 12:18:52.620025 mon.ceph-mon-s43
                  mon.0 10.23.27.153:6789/0
                  1534932 : cluster [WRN] Health check update: 92 slow
                  requests are blocked > 32 sec (REQUEST_SLOW)

                2018-06-29 12:18:58.641621 mon.ceph-mon-s43
                  mon.0 10.23.27.153:6789/0
                  1534939 : cluster [WRN] Health check update: 32 slow
                  requests are blocked > 32 sec (REQUEST_SLOW)

                2018-06-29 12:19:02.653015 mon.ceph-mon-s43
                  mon.0 10.23.27.153:6789/0
                  1534948 : cluster [INF] Health check cleared:
                  REQUEST_SLOW (was: 32 slow requests are blocked >
                  32 sec)

                2018-06-29 12:19:02.653048 mon.ceph-mon-s43
                  mon.0 10.23.27.153:6789/0
                  1534949 : cluster [INF] Cluster is now healthy

                2018-06-29 12:19:08.674106 mon.ceph-mon-s43
                  mon.0 10.23.27.153:6789/0
                  1534956 : cluster [WRN] Health check failed: 15 slow
                  requests are blocked > 32 sec (REQUEST_SLOW)

                2018-06-29 12:19:14.491798 mon.ceph-mon-s43
                  mon.0 10.23.27.153:6789/0
                  1534963 : cluster [WRN] Health check update: 14 slow
                  requests are blocked > 32 sec (REQUEST_SLOW)

                2018-06-29 12:19:19.492129 mon.ceph-mon-s43
                  mon.0 10.23.27.153:6789/0
                  1534969 : cluster [WRN] Health check update: 32 slow
                  requests are blocked > 32 sec (REQUEST_SLOW)

                2018-06-29 12:19:22.726667 mon.ceph-mon-s43
                  mon.0 10.23.27.153:6789/0
                  1534973 : cluster [INF] Health check cleared:
                  REQUEST_SLOW (was: 32 slow requests are blocked >
                  32 sec)

                2018-06-29 12:19:22.726697 mon.ceph-mon-s43
                  mon.0 10.23.27.153:6789/0
                  1534974 : cluster [INF] Cluster is now healthy

                2018-06-29 13:00:00.000121 mon.ceph-mon-s43
                  mon.0 10.23.27.153:6789/0
                  1537844 : cluster [INF] overall HEALTH_OK
              Is that related to snap trimming?
              I am currently migrating 250 virtual machines to my new
                an shiny 2448 PGs, 72 OSD (48 HDD, 24 SSD, 5 osd nodes)
                cluster and these messages appear with some delay after
                the daily rbd snapshot creation....

              
              Regards
              
                  Marc

                  
                  Am
                    29.06.2018 um 04:27 schrieb Gregory Farnum:

                  
                  You may find my talk at
                    OpenStack Boston’s Ceph day last year to be useful:
                    https://www.youtube.com/watch?v=rY0OWtllkn8

                    -Greg

                    
                      On Wed, Jun 27, 2018 at 9:06 AM
                        Marc Schöchlin <ms@xxxxxxxxxx>
                        wrote:

                      
                      Hello list,

                        
                        i currently hold 3 snapshots per rbd image for
                        my virtual systems.

                        
                        What i miss in the current documentation:

                        
                          * details about the implementation of
                        snapshots

                              o implementation details

                              o which scenarios create high overhead per
                        snapshot

                              o what causes the really short performance
                        degration on snapshot

                                creation/deletion

                              o why do i not see a significant rbd
                        performance degration if

                                there a numerous snapshots

                              o ....

                          * details and recommendations about the
                        overhead of snapshots

                              o what performance penalty do i have to
                        expect for a write/read iop

                              o what are the edgecases of the
                        implemnetation

                              o how many snapshots per image (i.e
                        virtual machine) might be a

                                good idea

                              o ...

                        
                        Regards

                        Marc

                        
                        Am 27.06.2018 um 15:37 schrieb Brian ::

                        > Hi John

                        >

                        > Have you looked at ceph documentation?

                        >

                        > RBD: http://docs.ceph.com/docs/luminous/rbd/rbd-snapshot/

                        >

                        > The ceph project documentation is really
                        good for most areas. Have a

                        > look at what you can find then come back
                        with more specific questions!

                        >

                        > Thanks

                        > Brian

                        >

                        >

                        >

                        >

                        > On Wed, Jun 27, 2018 at 2:24 PM, John
                        Molefe <John.Molefe@xxxxxxxxx>
                        wrote:

                        >> Hi everyone

                        >>

                        >> I would like some advice and insight
                        into how ceph snapshots work and how it

                        >> can be setup.

                        >>

                        >> Responses will be much appreciated.

                        >>

                        >> Thanks

                        >> John

                        >>

                        >> Vrywaringsklousule / Disclaimer:

                        >> http://www.nwu.ac.za/it/gov-man/disclaimer.html

                        >>

                        >>

                        >> _______________________________________________

                        >> ceph-users mailing list

                        >> ceph-users@xxxxxxxxxxxxxx

                        >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

                        >>

                        > _______________________________________________

                        > ceph-users mailing list

                        > ceph-users@xxxxxxxxxxxxxx

                        > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

                        
                        _______________________________________________

                        ceph-users mailing list

                        ceph-users@xxxxxxxxxxxxxx

                        http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

                      
            _______________________________________________

            ceph-users mailing list

            ceph-users@xxxxxxxxxxxxxx

            http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

            
        -- 

        
                  Paul Emmerich

                    
                    Looking for help with your Ceph cluster? Contact us
                    at https://croit.io

                    
                    croit GmbH

                    Freseniusstr. 31h

                    81247 München

                    www.croit.io

                    Tel: +49 89 1896585 90

                  
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com