Re: 17.2.7: Backfilling deadlock / stall / stuck / standstill

Mark Nelson <mark.nelson@xxxxxxxxx> · Fri, 26 Jan 2024 16:09:50 -0600

For what it's worth, we saw this last week at Clyso on two separate 
customer clusters on 17.2.7 and also solved it by moving back to wpq.  
We've been traveling this week so haven't created an upstream tracker 
for it yet, but we're back to recommending wpq to our customers for all 
production cluster deployments until we figure out what's going on.

Mark

On 1/26/24 15:08, Wesley Dillingham wrote:
I faced a similar issue. The PG just would never finish recovery. Changing
all OSDs in the PG to "osd_op_queue wpq" and then restarting them serially
ultimately allowed the PG to recover. Seemed to be some issue with mclock.

Respectfully,

*Wes Dillingham*
wes@xxxxxxxxxxxxxxxxx
LinkedIn <http://www.linkedin.com/in/wesleydillingham>

On Fri, Jan 26, 2024 at 7:57 AM Kai Stian Olstad <ceph+list@xxxxxxxxxx>
wrote:

Hi,

This is a cluster running 17.2.7 upgraded from 16.2.6 on the 15 January
2024.

On Monday 22 January we had 4 HDD all on different server with I/O-error
because of some damage sectors, the OSD is hybrid so the DB is on SSD, 5
HDD share 1 SSD.
I set the OSD out, ceph osd out 223 269 290 318 and all hell broke
loose.

I took only minutes before the users complained about Ceph not working.
Ceph status reportet slow OPS on the OSDs that was set to out, and “ceph
tell osd.<id> dump_ops_in_flight” against the out OSDs it just hang,
after 30 minutes I stopped the dump command.
Long story short I ended up running “ceph osd set nobackfill” to slow
ops was gone and then unset it when the slow ops message disappeared.
I needed to run that all the time so the cluster didn’t come to a holt
so this oneliner loop was used
“while true; do ceph -s | grep -qE "oldest one blocked for [0-9]{2,}" &&
(date; ceph osd set nobackfill; sleep 15; ceph osd unset nobackfill);
sleep 10; done”

But now 4 days later the backfilling has stopped progressing completely
and the number of misplaced object is increasing.
Some PG has 0 misplaced object but sill have backfilling state, and been
in this state for over 24 hours now.

I have a hunch that it’s because of PG 404.6e7 is in state
“active+recovering+degraded+remapped” it’s been in this state for over
48 hours.
It’s has possible 2 missing object, but since they are not unfound I
can’t delete them with “ceph pg 404.6e7 mark_unfound_lost delete”

Could someone please help to solve this?
Down below is some output of ceph commands, I’ll also attache them.

ceph status (only removed information about no running scrub and
deep_scrub)
---
    cluster:
      id:     b321e76e-da3a-11eb-b75c-4f948441dcd0
      health: HEALTH_WARN
              Degraded data redundancy: 2/6294904971 objects degraded
(0.000%), 1 pg degraded

    services:
      mon: 3 daemons, quorum ceph-mon-1,ceph-mon-2,ceph-mon-3 (age 11d)
      mgr: ceph-mon-1.ptrsea(active, since 11d), standbys:
ceph-mon-2.mfdanx
      mds: 1/1 daemons up, 1 standby
      osd: 355 osds: 355 up (since 22h), 351 in (since 4d); 18 remapped
pgs
      rgw: 7 daemons active (7 hosts, 1 zones)

    data:
      volumes: 1/1 healthy
      pools:   14 pools, 3945 pgs
      objects: 1.14G objects, 1.1 PiB
      usage:   1.8 PiB used, 1.2 PiB / 3.0 PiB avail
      pgs:     2/6294904971 objects degraded (0.000%)
               2980455/6294904971 objects misplaced (0.047%)
               3901 active+clean
               22   active+clean+scrubbing+deep
               17   active+remapped+backfilling
               4    active+clean+scrubbing
               1    active+recovering+degraded+remapped

    io:
      client:   167 MiB/s rd, 13 MiB/s wr, 6.02k op/s rd, 2.35k op/s wr

ceph health detail (only removed information about no running scrub and
deep_scrub)
---
HEALTH_WARN Degraded data redundancy: 2/6294902067 objects degraded
(0.000%), 1 pg degraded
[WRN] PG_DEGRADED: Degraded data redundancy: 2/6294902067 objects
degraded (0.000%), 1 pg degraded
      pg 404.6e7 is active+recovering+degraded+remapped, acting
[223,274,243,290,286,283]

ceph pg 202.6e7 list_unfound
---
{
      "num_missing": 2,
      "num_unfound": 0,
      "objects": [],
      "state": "Active",
      "available_might_have_unfound": true,
      "might_have_unfound": [],
      "more": false
}

ceph pg 404.6e7 query | jq .recovery_state
---
[
    {
      "name": "Started/Primary/Active",
      "enter_time": "2024-01-26T09:08:41.918637+0000",
      "might_have_unfound": [
        {
          "osd": "243(2)",
          "status": "already probed"
        },
        {
          "osd": "274(1)",
          "status": "already probed"
        },
        {
          "osd": "275(0)",
          "status": "already probed"
        },
        {
          "osd": "283(5)",
          "status": "already probed"
        },
        {
          "osd": "286(4)",
          "status": "already probed"
        },
        {
          "osd": "290(3)",
          "status": "already probed"
        },
        {
          "osd": "335(3)",
          "status": "already probed"
        }
      ],
      "recovery_progress": {
        "backfill_targets": [
          "275(0)",
          "335(3)"
        ],
        "waiting_on_backfill": [],
        "last_backfill_started":

"404:e76011a9:::1f244892-a2e7-406b-aa62-1b13511333a2.625411.18_56463c71-286c-4399-8d5d-0c278b7c97fd:head",
        "backfill_info": {
          "begin": "MIN",
          "end": "MIN",
          "objects": []
        },
        "peer_backfill_info": [],
        "backfills_in_flight": [],
        "recovering": [],
        "pg_backend": {
          "recovery_ops": [],
          "read_ops": []
        }
      }
    },
    {
      "name": "Started",
      "enter_time": "2024-01-26T09:08:40.909151+0000"
    }
]

ceph pg ls recovering backfilling
---
PG       OBJECTS  DEGRADED  MISPLACED  UNFOUND  BYTES
OMAP_BYTES*  OMAP_KEYS*  LOG    LOG_DUPS  STATE
        SINCE  VERSION          REPORTED          UP
        ACTING
404.bc    287986         0          0        0  512046716673
0           0  10091         0           active+recovering+remapped
2h   217988'1385478   217988:10897565  [193,297,279,276,136,197]p193
[223,297,269,276,136,197]p223
404.c4    288236         0     288236        0  511669837559
0           0  10063         0          active+remapped+backfilling
24h   217988'1378228   217988:11719855  [156,186,178,345,339,177]p156
[223,186,178,345,339,177]p223
404.12a   287544         0          0        0  512246100354
0           0  10009         0          active+remapped+backfilling
24h   217988'1392371   217988:13739524  [248,178,250,145,304,272]p248
[223,178,250,145,304,272]p223
404.1c1   287739         0     286969        0  511800674008
0           0  10047         0          active+remapped+backfilling
2d   217988'1402889   217988:10975174  [332,246,183,169,280,255]p332
[318,246,183,169,280,255]p318
404.258   287737         0     277111        0  510099501390
0           0  10077         0          active+remapped+backfilling
24h   217988'1451778   217988:12780104  [308,199,134,342,188,221]p308
[318,199,134,342,188,221]p318
404.269   287990         0          0        0  512343190608
0           0  10043         0          active+remapped+backfilling
24h   217988'1358446   217988:14020217  [275,205,283,247,211,292]p275
[223,205,283,247,211,292]p223
404.34e   287624         0     277899        0  510447074297
0           0  10002         0          active+remapped+backfilling
24h   217988'1392933   217988:12636557  [322,141,338,168,251,218]p322
[318,141,338,168,251,218]p318
404.39c   287844         0     286692        0  512947685682
0           0  10017         0          active+remapped+backfilling
2d   217988'1414697   217988:11004944  [288,188,131,299,295,181]p288
[318,188,131,299,295,181]p318
404.511   287589         0          0        0  512014863711
0           0  10057         0          active+remapped+backfilling
24h   217988'1368741   217988:11544729  [166,151,327,333,186,150]p166
[223,151,327,333,186,150]p223
404.5f1   288126         0     286621        0  510850256945
0           0  10071         0          active+remapped+backfilling
24h   217988'1365831   217988:10348125  [214,332,289,184,255,160]p214
[223,332,289,184,255,160]p223
404.62a   288035         0          0        0  511318662269
0           0  10014         0          active+remapped+backfilling
3h   217988'1358010   217988:12528704  [322,260,259,319,149,152]p322
[318,260,259,319,149,152]p318
404.63d   287372         0     286559        0  508783837699
0           0  10074         0          active+remapped+backfilling
24h   217988'1402174   217988:11685744  [303,307,186,350,161,267]p303
[318,307,186,350,161,267]p318
404.6e3   288110         0          0        0  509047569016
0           0  10049         0          active+remapped+backfilling
24h   217988'1368547   217988:12202278  [166,317,233,144,337,240]p166
[223,317,233,144,337,240]p223
404.6e7   287856         2          2        0  510383394904
0           0  10047         0  active+recovering+degraded+remapped
3h   217988'1356501   217988:13157749  [275,274,243,335,286,283]p275
[223,274,243,290,286,283]p223
404.7d2   287619         0     286026        0  510708533087
0           0  10093         0          active+remapped+backfilling
3d   217988'1397393   217988:12146656  [185,139,299,222,155,149]p185
[223,139,299,222,155,149]p223
412.119   711468         0          0        0  207473602580
0           0  10099         0          active+remapped+backfilling
24h  217988'21613330   217988:87589096  [352,207,292,314,230,262]p352
[318,207,292,314,230,262]p318
412.12f   711529         0     701279        0  208498170310
0           0  10033         0          active+remapped+backfilling
24h  217988'14873593   217988:86198113  [303,305,183,215,130,244]p303
[318,305,183,215,130,244]p318
412.1fb   713044         0       3166        0  207787641403
0           0  10097         0          active+remapped+backfilling
2d  217988'14893270  217988:105346132  [156,137,228,241,262,353]p156
[223,137,228,241,262,353]p223

ceph osd tree out
---
ID   CLASS  WEIGHT      TYPE NAME             STATUS  REWEIGHT  PRI-AFF
   -1         3112.43481  root default
-67          192.35847      host ceph-hd-001
269    hdd    12.82390          osd.269           up         0  1.00000
-49          192.35847      host ceph-hd-003
223    hdd    12.82390          osd.223           up         0  1.00000
-73          192.35847      host ceph-hd-011
290    hdd    12.82390          osd.290           up         0  1.00000
-79          192.35847      host ceph-hd-014
318    hdd    12.82390          osd.318           up         0
1.00000_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx