Re: PGs down

Igor Fedotov <ifedotov@xxxxxxx> · Sun, 13 Dec 2020 10:24:20 +0300

Hi Jeremy,

wondering what were the OSDs' logs when they crashed for the first time?

And does OSD.12 reports the similar problem for now:

3> 2020-12-12 20:23:45.756 7f2d21404700 -1 rocksdb: submit_common error: 
Corruption: block checksum mismatch: expected 3113305400, got 1242690251 
in db/000348.sst offset 47935290 size 4704 code = 2 Rocksdb transaction:

?

Thanks,
Igor
On 12/13/2020 8:48 AM, Jeremy Austin wrote:
I could use some input from more experienced folks…

First time seeing this behavior. I've been running ceph in production
(replicated) since 2016 or earlier.

This, however, is a small 3-node cluster for testing EC. Crush map rules
should sustain the loss of an entire node.
Here's the EC rule:

rule cephfs425 { id 6 type erasure min_size 3 max_size 6 step
set_chooseleaf_tries 40 step set_choose_tries 400 step take default step
choose indep 3 type host step choose indep 2 type osd step emit }

I had actual hardware failure on one node. Interestingly, this appears to
have resulted in data loss. OSDs began to crash in a cascade on other nodes
(i.e., nodes with no known hardware failure). Not a low RAM problem.

I could use some pointers about how to get the down PGs back up — I *think*
there are enough EC shards, even disregarding the OSDs that crash on start.

nautilus 14.2.15

  ceph osd tree
ID  CLASS WEIGHT   TYPE NAME       STATUS REWEIGHT PRI-AFF
  -1       54.75960 root default
-10       16.81067     host sumia
   1   hdd  5.57719         osd.1       up  1.00000 1.00000
   5   hdd  5.58469         osd.5       up  1.00000 1.00000
   6   hdd  5.64879         osd.6       up  1.00000 1.00000
  -7       16.73048     host sumib
   0   hdd  5.57899         osd.0       up  1.00000 1.00000
   2   hdd  5.56549         osd.2       up  1.00000 1.00000
   3   hdd  5.58600         osd.3       up  1.00000 1.00000
  -3       21.21844     host tower1
   4   hdd  3.71680         osd.4       up        0 1.00000
   7   hdd  1.84799         osd.7       up  1.00000 1.00000
   8   hdd  3.71680         osd.8       up  1.00000 1.00000
   9   hdd  1.84929         osd.9       up  1.00000 1.00000
  10   hdd  2.72899         osd.10      up  1.00000 1.00000
  11   hdd  3.71989         osd.11    down        0 1.00000
  12   hdd  3.63869         osd.12    down        0 1.00000

   cluster:
     id:     d0b4c175-02ba-4a64-8040-eb163002cba6
     health: HEALTH_ERR
             1 MDSs report slow requests
             4/4239345 objects unfound (0.000%)
             Too many repaired reads on 3 OSDs
             Reduced data availability: 7 pgs inactive, 7 pgs down
             Possible data damage: 4 pgs recovery_unfound
             Degraded data redundancy: 95807/24738783 objects degraded
(0.387%), 4 pgs degraded, 3 pgs undersized
             7 pgs not deep-scrubbed in time
             7 pgs not scrubbed in time

   services:
     mon: 3 daemons, quorum sumib,tower1,sumia (age 4d)
     mgr: sumib(active, since 7d), standbys: sumia, tower1
     mds: cephfs:1 {0=sumib=up:active} 2 up:standby
     osd: 13 osds: 11 up (since 3d), 10 in (since 4d); 3 remapped pgs

   data:
     pools:   5 pools, 256 pgs
     objects: 4.24M objects, 15 TiB
     usage:   24 TiB used, 24 TiB / 47 TiB avail
     pgs:     2.734% pgs not active
              95807/24738783 objects degraded (0.387%)
              47910/24738783 objects misplaced (0.194%)
              4/4239345 objects unfound (0.000%)
              245 active+clean
              7   down
              3   active+recovery_unfound+undersized+degraded+remapped
              1   active+recovery_unfound+degraded+repair

   progress:
     Rebalancing after osd.12 marked out
       [============================..]
     Rebalancing after osd.4 marked out
       [=============================.]

An snipped from an example down pg:
     "up": [
         3,
         2,
         5,
         1,
         8,
         9
     ],
     "acting": [
         3,
         2,
         5,
         1,
         8,
         9
     ],
<snip>
          ],
             "blocked": "peering is blocked due to down osds",
             "down_osds_we_would_probe": [
                 11,
                 12
             ],
             "peering_blocked_by": [
                 {
                     "osd": 11,
                     "current_lost_at": 0,
                     "comment": "starting or marking this osd lost may let
us proceed"
                 },
                 {
                     "osd": 12,
                     "current_lost_at": 0,
                     "comment": "starting or marking this osd lost may let
us proceed"
                 }
             ]
         },
         {

Oddly, these OSDs possibly did NOT experience hardware failure. However,
they won't start -- see pastebin for ceph-osd.11.log

https://pastebin.com/6U6sQJuJ

HEALTH_ERR 1 MDSs report slow requests; 4/4239345 objects unfound (0.000%);
Too many repaired reads on 3 OSDs; Reduced data availability
: 7 pgs inactive, 7 pgs down; Possible data damage: 4 pgs recovery_unfound;
Degraded data redundancy: 95807/24738783 objects degraded (0
.387%), 4 pgs degraded, 3 pgs undersized; 7 pgs not deep-scrubbed in time;
7 pgs not scrubbed in time
MDS_SLOW_REQUEST 1 MDSs report slow requests
     mdssumib(mds.0): 42 slow requests are blocked > 30 secs
OBJECT_UNFOUND 4/4239345 objects unfound (0.000%)
     pg 19.5 has 1 unfound objects
     pg 15.2f has 1 unfound objects
     pg 15.41 has 1 unfound objects
     pg 15.58 has 1 unfound objects
OSD_TOO_MANY_REPAIRS Too many repaired reads on 3 OSDs
     osd.9 had 9664 reads repaired
     osd.7 had 9665 reads repaired
     osd.4 had 12 reads repaired
PG_AVAILABILITY Reduced data availability: 7 pgs inactive, 7 pgs down
     pg 15.10 is down, acting [3,2,5,1,8,9]
     pg 15.1e is down, acting [5,1,9,8,2,3]
     pg 15.40 is down, acting [7,10,1,5,3,2]
     pg 15.4a is down, acting [0,3,5,6,9,10]
     pg 15.6a is down, acting [3,2,6,1,10,8]
     pg 15.71 is down, acting [3,2,1,6,8,10]
     pg 15.76 is down, acting [2,0,6,5,10,9]
PG_DAMAGED Possible data damage: 4 pgs recovery_unfound
     pg 15.2f is active+recovery_unfound+undersized+degraded+remapped,
acting [5,1,0,3,2147483647,7], 1 unfound
     pg 15.41 is active+recovery_unfound+undersized+degraded+remapped,
acting [5,1,0,3,2147483647,2147483647], 1 unfound
     pg 15.58 is active+recovery_unfound+undersized+degraded+remapped,
acting [10,2147483647,2,3,1,5], 1 unfound
     pg 19.5 is active+recovery_unfound+degraded+repair, acting
[3,2,5,1,8,10], 1 unfound
PG_DEGRADED Degraded data redundancy: 95807/24738783 objects degraded
(0.387%), 4 pgs degraded, 3 pgs undersized
     pg 15.2f is stuck undersized for 635305.932075, current state
active+recovery_unfound+undersized+degraded+remapped, last acting
[5,1,0,3,2147483647,7]
     pg 15.41 is stuck undersized for 364298.836902, current state
active+recovery_unfound+undersized+degraded+remapped, last acting
[5,1,0,3,2147483647,2147483647]
     pg 15.58 is stuck undersized for 384461.110229, current state
active+recovery_unfound+undersized+degraded+remapped, last acting
[10,2147483647,2,3,1,5]
     pg 19.5 is active+recovery_unfound+degraded+repair, acting
[3,2,5,1,8,10], 1 unfound
PG_NOT_DEEP_SCRUBBED 7 pgs not deep-scrubbed in time
     pg 15.76 not deep-scrubbed since 2020-10-21 14:30:03.935228
     pg 15.71 not deep-scrubbed since 2020-10-21 12:20:46.235792
     pg 15.6a not deep-scrubbed since 2020-10-21 07:52:33.914083
     pg 15.10 not deep-scrubbed since 2020-10-22 03:24:40.465367
     pg 15.1e not deep-scrubbed since 2020-10-22 10:37:36.169959
     pg 15.40 not deep-scrubbed since 2020-10-23 05:33:35.208748
     pg 15.4a not deep-scrubbed since 2020-10-22 05:14:06.981035
PG_NOT_SCRUBBED 7 pgs not scrubbed in time
     pg 15.76 not scrubbed since 2020-10-24 08:12:40.090831
     pg 15.71 not scrubbed since 2020-10-25 05:22:40.573572
     pg 15.6a not scrubbed since 2020-10-24 15:03:09.189964
     pg 15.10 not scrubbed since 2020-10-24 16:25:08.826981
     pg 15.1e not scrubbed since 2020-10-24 16:05:03.080127
     pg 15.40 not scrubbed since 2020-10-24 11:58:04.290488
     pg 15.4a not scrubbed since 2020-10-24 11:32:44.573551
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx