Re: Cluster healthy, but 16.2.7 osd daemon upgrade says its unsafe to stop them?

"Zach Heise (SSCC)" <heise@xxxxxxxxxxxx> · Thu, 10 Feb 2022 14:58:09 -0600



    Yes, these 8 PGs have been in this 'remapped' state for quite
      awhile. I don't know why CRUSH has not seen fit to designate new
      OSDs for them so that acting and up match.

    
    For the error in question - ceph upgrade is saying that only 1 PG
      would become offline if OSD(s) were stopped. So if these 8 pgs
      were causing the problem, I thought it would tell me specifically.

    
    Greg, is there a way I could check if crush is failing to map
      properly and figure out why? Because HEALTH_OK is shown even with
      these 8 active+clean+remapped PGs I assumed it was normal/okay for
      it to be in this state.

    
          PG_STAT
          OBJECTS
          MISSING_ON_PRIMARY
          DEGRADED
          MISPLACED
          UNFOUND
          BYTES
          OMAP_BYTES*
          OMAP_KEYS*
          LOG
          DISK_LOG
          STATE
          STATE_STAMP
          VERSION
          REPORTED
          UP
          UP_PRIMARY
          ACTING
          ACTING_PRIMARY
          LAST_SCRUB
          SCRUB_STAMP
          LAST_DEEP_SCRUB
          DEEP_SCRUB_STAMP
          SNAPTRIMQ_LEN
        
        
          7.11
          42
          0
          0
          42
          0
          1.76E+08
          0
          0
          631
          631
          active+clean+remapped
          2022-02-10T05:16:21.791091+0000
          8564'631
          10088:11277
          [15,7]
          15
          [15,7,11]
          15
          8564'631
          2022-02-10T05:16:21.791028+0000
          8564'631
          2022-02-08T17:38:26.806576+0000
          0
        
        
          9.17
          23
          0
          0
          23
          0
          88554155
          0
          0
          2700
          2700
          active+clean+remapped
          2022-02-09T22:40:19.229658+0000
          9668'2700
          10088:15778
          [22,9]
          22
          [22,9,2]
          22
          9668'2700
          2022-02-09T22:40:19.229581+0000
          9668'2700
          2022-02-06T13:09:04.264912+0000
          0
        
        
          11.10
          3
          0
          0
          3
          0
          9752576
          0
          0
          6323
          6323
          active+clean+remapped
          2022-02-10T16:56:10.410048+0000
          6255'6323
          10088:23237
          [0,19]
          0
          [0,19,2]
          0
          6255'6323
          2022-02-10T16:56:10.409954+0000
          6255'6323
          2022-02-05T18:08:35.490642+0000
          0
        
        
          11.19
          2
          0
          0
          2
          0
          4194304
          0
          0
          10008
          10008
          active+clean+remapped
          2022-02-09T21:52:33.190075+0000
          9862'14908
          10088:38973
          [19,9]
          19
          [19,9,12]
          19
          9862'14908
          2022-02-09T21:52:33.190002+0000
          8852'14906
          2022-02-04T21:34:27.141103+0000
          0
        
        
          11.1a
          2
          0
          0
          2
          0
          4194323
          0
          0
          8522
          8522
          active+clean+remapped
          2022-02-10T10:08:29.451623+0000
          5721'8522
          10088:29920
          [12,24]
          12
          [12,24,28]
          12
          5721'8522
          2022-02-10T10:08:29.451543+0000
          5721'8522
          2022-02-09T04:45:34.096178+0000
          0
        
        
          7.1a
          67
          0
          0
          67
          0
          2.81E+08
          0
          0
          1040
          1040
          active+clean+remapped
          2022-02-09T18:39:53.571433+0000
          8537'1040
          10088:13580
          [20,3]
          20
          [20,3,28]
          20
          8537'1040
          2022-02-09T18:39:53.571328+0000
          8537'1040
          2022-02-09T18:39:53.571328+0000
          0
        
        
          7.e
          63
          0
          0
          63
          0
          2.6E+08
          0
          0
          591
          591
          active+clean+remapped
          2022-02-10T11:40:11.560673+0000
          8442'591
          10088:11607
          [25,3]
          25
          [25,3,33]
          25
          8442'591
          2022-02-10T11:40:11.560567+0000
          8442'591
          2022-02-10T11:40:11.560567+0000
          0
        
        
          9.d
          29
          0
          0
          29
          0
          1.17E+08
          0
          0
          2448
          2448
          active+clean+remapped
          2022-02-10T14:22:42.203264+0000
          9784'2448
          10088:16349
          [22,2]
          22
          [22,2,8]
          22
          9784'2448
          2022-02-10T14:22:42.203183+0000
          9784'2448
          2022-02-06T15:38:36.389808+0000
          0
        
      
    Zach 
      

    On 2022-02-10 1:43 PM,
      gfarnum@xxxxxxxxxx wrote:

    
      ?Up? is the set of OSDs which are alive from the
        calculated crush mapping. ?Acting? includes those extras which
        have been added in to bring the PG up to proper size. So the PG
        does have 3 live OSDs serving it.
      

      But perhaps the safety check *is* looking at up
        instead of acting? That seems like a plausible bug. (Also, if
        crush is failing to map properly, that?s not a great sign for
        your cluster health or design.)
      

          On Thu, Feb 10, 2022 at
            11:26 AM ? ?? <huww98@xxxxxxxxxxx>
            wrote:

          
          I
            believe this is the reason.

            
            I mean number of OSDs in the ?up? set should be at least 1
            greater than the min_size for the upgrade to proceed. Or
            once any OSD is stopped, it can drop below min_size, and
            prevent the pg from becoming active. So just cleanup the
            misplaced and the upgrade should proceed automatically.

            
            But I?m a little confused. I think if you have only 2 up OSD
            in a replicate x3 pool, it should in degraded state, and
            should give you a HEALTH_WARN.

            
            ? 2022?2?11??03:06?Zach Heise (SSCC) <heise@xxxxxxxxxxxx>
            ???

            
            ?

            
            Hi Weiwen, thanks for replying.

            
            All of my replicated pools, including the newest ssdpool I
            made most recently, have a min_size of 2. My other two EC
            pools have a min_size of 3.

            
            Looking at pg dump output again, it does look like the two
            EC pools have exactly 4 OSDs listed in the "Acting" column,
            and everything else has 3 OSDs in Acting. So that's as it
            should be, I believe?

            
            I do have some 'misplaced' objects on 8 different PGs (the
            active+clean+remapped ones listed in my original ceph -s
            output), that only have 2 "up" OSDs listed, but in the
            "Acting" columns each have 3 OSDs as they should. Apparently
            these 231 misplaced objects aren't enough to cause ceph to
            drop out of HEALTH_OK status.

            
            Zach

            
            On 2022-02-10 12:41 PM, huww98@xxxxxxxxxxx<mailto:huww98@xxxxxxxxxxx>
            wrote:

            
            Hi Zach,

            
            How about your min_size setting? Have you checked the number
            of OSDs in the acting set of every PG is at least 1 greater
            than the min_size of the corresponding pool?

            
            Weiwen Hu

            
            ? 2022?2?10??05:02?Zach Heise (SSCC) <heise@xxxxxxxxxxxx><mailto:heise@xxxxxxxxxxxx>
            ???

            
            ?Hello,

            
            ceph health detail says my 5-node cluster is healthy, yet
            when I ran ceph orch upgrade start --ceph-version 16.2.7
            everything seemed to go fine until we got to the OSD
            section, now for the past hour, every 15 seconds a new log
            entry of  'Upgrade: unsafe to stop osd(s) at this time (1
            PGs are or would become offline)' appears in the logs.

            
            ceph pg dump_stuck (unclean, degraded, etc) shows "ok" for
            everything too. Yet somehow 1 PG is (apparently) holding up
            all the OSD upgrades and not letting the process finish.
            Should I stop the upgrade and try it again? (I haven't done
            that before so was just nervous to try it). Any other ideas?

            
             cluster:

               id:     9aa000e8-b999-11eb-82f2-ecf4bbcc0ac0

               health: HEALTH_OK

              services:

               mon: 4 daemons, quorum ceph05,ceph04,ceph01,ceph03 (age
            92m)

               mgr: ceph03.futetp(active, since 97m), standbys:
            ceph01.fblojp

               mds: 1/1 daemons up, 1 hot standby

               osd: 33 osds: 33 up (since 2h), 33 in (since 4h); 9
            remapped pgs

              data:

               volumes: 1/1 healthy

               pools:   7 pools, 193 pgs

               objects: 3.72k objects, 14 GiB

               usage:   43 GiB used, 64 TiB / 64 TiB avail

               pgs:     231/11170 objects misplaced (2.068%)

                        185 active+clean

                        8   active+clean+remapped

              io:

               client:   1.2 KiB/s rd, 2 op/s rd, 0 op/s wr

              progress:

               Upgrade to 16.2.7 (5m)

                 [=====.......................] (remaining: 24m)

            
            --

            Zach

            
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

PG_STAT	OBJECTS	MISSING_ON_PRIMARY	DEGRADED	MISPLACED	UNFOUND	BYTES	OMAP_BYTES*	OMAP_KEYS*	LOG	DISK_LOG	STATE	STATE_STAMP	VERSION	REPORTED	UP	UP_PRIMARY	ACTING	ACTING_PRIMARY	LAST_SCRUB	SCRUB_STAMP	LAST_DEEP_SCRUB	DEEP_SCRUB_STAMP	SNAPTRIMQ_LEN
7.11	42	0	0	42	0	1.76E+08	0	0	631	631	active+clean+remapped	2022-02-10T05:16:21.791091+0000	8564'631	10088:11277	[15,7]	15	[15,7,11]	15	8564'631	2022-02-10T05:16:21.791028+0000	8564'631	2022-02-08T17:38:26.806576+0000	0
9.17	23	0	0	23	0	88554155	0	0	2700	2700	active+clean+remapped	2022-02-09T22:40:19.229658+0000	9668'2700	10088:15778	[22,9]	22	[22,9,2]	22	9668'2700	2022-02-09T22:40:19.229581+0000	9668'2700	2022-02-06T13:09:04.264912+0000	0
11.10	3	0	0	3	0	9752576	0	0	6323	6323	active+clean+remapped	2022-02-10T16:56:10.410048+0000	6255'6323	10088:23237	[0,19]	0	[0,19,2]	0	6255'6323	2022-02-10T16:56:10.409954+0000	6255'6323	2022-02-05T18:08:35.490642+0000	0
11.19	2	0	0	2	0	4194304	0	0	10008	10008	active+clean+remapped	2022-02-09T21:52:33.190075+0000	9862'14908	10088:38973	[19,9]	19	[19,9,12]	19	9862'14908	2022-02-09T21:52:33.190002+0000	8852'14906	2022-02-04T21:34:27.141103+0000	0
11.1a	2	0	0	2	0	4194323	0	0	8522	8522	active+clean+remapped	2022-02-10T10:08:29.451623+0000	5721'8522	10088:29920	[12,24]	12	[12,24,28]	12	5721'8522	2022-02-10T10:08:29.451543+0000	5721'8522	2022-02-09T04:45:34.096178+0000	0
7.1a	67	0	0	67	0	2.81E+08	0	0	1040	1040	active+clean+remapped	2022-02-09T18:39:53.571433+0000	8537'1040	10088:13580	[20,3]	20	[20,3,28]	20	8537'1040	2022-02-09T18:39:53.571328+0000	8537'1040	2022-02-09T18:39:53.571328+0000	0
7.e	63	0	0	63	0	2.6E+08	0	0	591	591	active+clean+remapped	2022-02-10T11:40:11.560673+0000	8442'591	10088:11607	[25,3]	25	[25,3,33]	25	8442'591	2022-02-10T11:40:11.560567+0000	8442'591	2022-02-10T11:40:11.560567+0000	0
9.d	29	0	0	29	0	1.17E+08	0	0	2448	2448	active+clean+remapped	2022-02-10T14:22:42.203264+0000	9784'2448	10088:16349	[22,2]	22	[22,2,8]	22	9784'2448	2022-02-10T14:22:42.203183+0000	9784'2448	2022-02-06T15:38:36.389808+0000	0