Re: Slow peering caused by "wait for new map"

Guilherme Geronimo <guilherme.geronimo@xxxxxxxxx> · Wed, 4 Sep 2019 15:54:06 -0300



    Btw: After the storm, I highly suggest you to consider to use
      Jumboframe.

      It works like a charm.
    []'s
Arthur (aKa Guilherme Geronimo)
    On 04/09/2019 15:50, Guilherme Geronimo
      wrote:

    
      I see that you have many inactive PGs, probably because the 6
        OSD OUT+DOWN.
      Problems with "flapping" OSD I use to solved:
      * setting NOUP flag

        * restarting the "fragile" OSDs

        * check if everything is ok look ing their logs

        * taking off the NOUP flag
      Another solution is:

        * Setting NOIN and NOUP flag

        * Taking the fragile OSD out

        * restarting the "fragile" OSDs

        * check if everything is ok look ing their logs

        * taking off the NOUP flag

        * Take a coffee and wait till all data are drain
      []'s
Arthur (aKa Guilherme Geronimo)
      On 04/09/2019 15:32, Bryan Stillwell
        wrote:

      
        We are not using jumbo frames anywhere on this cluster (all mtu
        1500).  The cluster was originally built in October of 2016 and
        has the following history:
        

        2016-10-04: Created with Hammer (0.94.3)
        2017-05-03: Upgraded to Hammer (0.94.10)
        2017-10-09: Upgraded to Jewel (10.2.9)
        2017-11-02: Upgraded to Jewel (10.2.10)
        2018-04-30: Upgraded to Luminous (12.2.5)
        2018-09-05: Upgraded to Luminous (12.2.8)
        2019-04-05: Upgraded to Luminous (12.2.11)
        2019-04-18: Upgraded to Luminous (12.2.12)
        2019-07-26: Upgraded to Nautilus (14.2.2)

          
          It wasn't until after the Nautilus upgrade when
            this problem started showing up.
          

          Here's the output you requested:
            

              [root@a2mon002 ~]# ceph -s
                cluster:
                  id:    
                XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX
                  health: HEALTH_ERR
                         
                nodown,norebalance,noscrub,nodeep-scrub flag(s) set
                          1 nearfull osd(s)
                          19 pool(s) nearfull
                          1 scrub errors
                          Reduced data availability: 6014
                pgs inactive, 3 pgs down, 5958 pgs peering, 83 pgs stale
                          Possible data damage: 1 pg
                inconsistent
                          Degraded data redundancy:
                1601/81648846 objects degraded (0.002%), 4 pgs degraded,
                5 pgs undersized
                          1048 slow requests are blocked
                > 32 sec
              

                services:
                  mon: 3 daemons, quorum
                a2mon002,a2mon003,a2mon004 (age 17m)
                  mgr: a2mon004(active, since 53m),
                standbys: a2mon003, a2mon002
                  mds: cephfs:2
                {0=a2mon004=up:active(laggy or
                crashed),1=a2mon003=up:active(laggy or crashed)} 1
                up:standby
                  osd: 143 osds: 141 up, 137 in; 486
                remapped pgs
                       flags
                nodown,norebalance,noscrub,nodeep-scrub
              

                data:
                  pools:   20 pools, 6288 pgs
                  objects: 27.22M objects, 98 TiB
                  usage:   308 TiB used, 114 TiB / 422 TiB
                avail
                  pgs:     0.048% pgs unknown
                           95.611% pgs not active
                           1601/81648846 objects degraded
                (0.002%)
                           53012/81648846 objects
                misplaced (0.065%)
                           5379 peering
                           495  remapped+peering
                           269  active+clean
                           75   stale+peering
                           46   activating
                           7    stale+remapped+peering
                           3    unknown
                           3    active+undersized+degraded
                           3    down
                           2    activating+remapped
                           1    activating+undersized
                           1    active+clean+scrubbing
                           1  
                 remapped+inconsistent+peering
                           1  
                 activating+undersized+degraded
                           1    stale+activating
                           1    creating+peering
              

                [root@a2mon002 ~]# ceph versions
                {
                    "mon": {
                        "ceph version 14.2.2
                  (4f8fa0a0024755aae7d95567c63f11d6862d55be) nautilus
                  (stable)": 3
                    },
                    "mgr": {
                        "ceph version 14.2.2
                  (4f8fa0a0024755aae7d95567c63f11d6862d55be) nautilus
                  (stable)": 3
                    },
                    "osd": {
                        "ceph version 14.2.2
                  (4f8fa0a0024755aae7d95567c63f11d6862d55be) nautilus
                  (stable)": 141
                    },
                    "mds": {
                        "ceph version 14.2.2
                  (4f8fa0a0024755aae7d95567c63f11d6862d55be) nautilus
                  (stable)": 1
                    },
                    "overall": {
                        "ceph version 14.2.2
                  (4f8fa0a0024755aae7d95567c63f11d6862d55be) nautilus
                  (stable)": 148
                    }
                }
              
              
              We had seen the slow peering shortly after the
                Nautilus upgrade, but it eventually recovered.  We then
                started filling the cluster up to test another Nautilus
                bug (https://tracker.ceph.com/issues/41255),
                but then a disk started to die (which caused the
                inconsistent PG).  When we marked it out we ran into
                this peering problem again, but it seems much worse this
                time.
              

              Thanks,
              Bryan
              

                  On Sep 4, 2019, at 11:55 AM, Guilherme
                    Geronimo <guilherme.geronimo@xxxxxxxxx>
                    wrote:
                  

                    Notice: This email is from an external
                      sender.

                      
                      Hey Bryan,

                      
                      I suppose all nodes are using jumboframes (mtu
                      9000), right?

                      I would suggest to check OSD->MON
                      communication.

                      
                      Can you send the output os these commands for us?

                      * ceph -s

                      * ceph versions

                      
                      []'s

                      Arthur (aKa Guilherme Geronimo)

                      
                      On 04/09/2019 14:18, Bryan Stillwell wrote:

                      Our test cluster
                        is seeing a problem where peering is going
                        incredibly slow shortly after upgrading it to
                        Nautilus (14.2.2) from Luminous (12.2.12).

                        
                        From what I can tell it seems to be caused by
                        "wait for new map" taking a long time.  When
                        looking at dump_historic_slow_ops on pretty much
                        any OSD I see stuff like this:

                        
                        # ceph daemon osd.112 dump_historic_slow_ops

                        [...snip...]

                               {

                                   "description": "osd_pg_create(e180614
                        287.4b:177739 287.75:177739 287.1c3:177739
                        287.1cf:177739 287.1e1:177739 287.2dd:177739
                        287.2fc:177739 287.342:177739 287.382:177739)",

                                   "initiated_at": "2019-09-03
                        15:12:41.366514",

                                   "age": 4800.8847047119998,

                                   "duration": 4780.0579745630002,

                                   "type_data": {

                                       "flag_point": "started",

                                       "events": [

                                           {

                                               "time": "2019-09-03
                        15:12:41.366514",

                                               "event": "initiated"

                                           },

                                           {

                                               "time": "2019-09-03
                        15:12:41.366514",

                                               "event": "header_read"

                                           },

                                           {

                                               "time": "2019-09-03
                        15:12:41.366501",

                                               "event": "throttled"

                                           },

                                           {

                                               "time": "2019-09-03
                        15:12:41.366547",

                                               "event": "all_read"

                                           },

                                           {

                                               "time": "2019-09-03
                        15:39:03.379456",

                                               "event": "dispatched"

                                           },

                                           {

                                               "time": "2019-09-03
                        15:39:03.379477",

                                               "event": "wait for new
                        map"

                                           },

                                           {

                                               "time": "2019-09-03
                        15:39:03.522376",

                                               "event": "wait for new
                        map"

                                           },

                                           {

                                               "time": "2019-09-03
                        15:53:55.912499",

                                               "event": "wait for new
                        map"

                                           },

                                           {

                                               "time": "2019-09-03
                        15:59:37.909063",

                                               "event": "wait for new
                        map"

                                           },

                                           {

                                               "time": "2019-09-03
                        16:00:43.356023",

                                               "event": "wait for new
                        map"

                                           },

                                           {

                                               "time": "2019-09-03
                        16:20:50.575498",

                                               "event": "wait for new
                        map"

                                           },

                                           {

                                               "time": "2019-09-03
                        16:31:48.689415",

                                               "event": "started"

                                           },

                                           {

                                               "time": "2019-09-03
                        16:32:21.424489",

                                               "event": "done"

                                           }

                                       ]

                                   }

                        
                        It always seems to be in osd_pg_create() with
                        multiple "wait for new map" messages before it
                        finally does something.  What could be causing
                        it so long to get the OSD map?  The mons don't
                        appear to be overloaded in any way.

                        
                        Thanks,

                        Bryan

                        _______________________________________________

                        ceph-users mailing list -- ceph-users@xxxxxxx

                        To unsubscribe send an email to 
                          ceph-users-leave@xxxxxxx

                      
                      _______________________________________________

                      ceph-users mailing list -- ceph-users@xxxxxxx

                      To unsubscribe send an email to  ceph-users-leave@xxxxxxx

                    
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx