Re: 17.2.7: Backfilling deadlock / stall / stuck / standstill

Wesley Dillingham <wes@xxxxxxxxxxxxxxxxx> · Fri, 26 Jan 2024 16:08:11 -0500

I faced a similar issue. The PG just would never finish recovery. Changing
all OSDs in the PG to "osd_op_queue wpq" and then restarting them serially
ultimately allowed the PG to recover. Seemed to be some issue with mclock.

Respectfully,

*Wes Dillingham*
wes@xxxxxxxxxxxxxxxxx
LinkedIn <http://www.linkedin.com/in/wesleydillingham>

On Fri, Jan 26, 2024 at 7:57 AM Kai Stian Olstad <ceph+list@xxxxxxxxxx>
wrote:

> Hi,
>
> This is a cluster running 17.2.7 upgraded from 16.2.6 on the 15 January
> 2024.
>
> On Monday 22 January we had 4 HDD all on different server with I/O-error
> because of some damage sectors, the OSD is hybrid so the DB is on SSD, 5
> HDD share 1 SSD.
> I set the OSD out, ceph osd out 223 269 290 318 and all hell broke
> loose.
>
> I took only minutes before the users complained about Ceph not working.
> Ceph status reportet slow OPS on the OSDs that was set to out, and “ceph
> tell osd.<id> dump_ops_in_flight” against the out OSDs it just hang,
> after 30 minutes I stopped the dump command.
> Long story short I ended up running “ceph osd set nobackfill” to slow
> ops was gone and then unset it when the slow ops message disappeared.
> I needed to run that all the time so the cluster didn’t come to a holt
> so this oneliner loop was used
> “while true; do ceph -s | grep -qE "oldest one blocked for [0-9]{2,}" &&
> (date; ceph osd set nobackfill; sleep 15; ceph osd unset nobackfill);
> sleep 10; done”
>
>
> But now 4 days later the backfilling has stopped progressing completely
> and the number of misplaced object is increasing.
> Some PG has 0 misplaced object but sill have backfilling state, and been
> in this state for over 24 hours now.
>
> I have a hunch that it’s because of PG 404.6e7 is in state
> “active+recovering+degraded+remapped” it’s been in this state for over
> 48 hours.
> It’s has possible 2 missing object, but since they are not unfound I
> can’t delete them with “ceph pg 404.6e7 mark_unfound_lost delete”
>
> Could someone please help to solve this?
> Down below is some output of ceph commands, I’ll also attache them.
>
>
> ceph status (only removed information about no running scrub and
> deep_scrub)
> ---
>    cluster:
>      id:     b321e76e-da3a-11eb-b75c-4f948441dcd0
>      health: HEALTH_WARN
>              Degraded data redundancy: 2/6294904971 objects degraded
> (0.000%), 1 pg degraded
>
>    services:
>      mon: 3 daemons, quorum ceph-mon-1,ceph-mon-2,ceph-mon-3 (age 11d)
>      mgr: ceph-mon-1.ptrsea(active, since 11d), standbys:
> ceph-mon-2.mfdanx
>      mds: 1/1 daemons up, 1 standby
>      osd: 355 osds: 355 up (since 22h), 351 in (since 4d); 18 remapped
> pgs
>      rgw: 7 daemons active (7 hosts, 1 zones)
>
>    data:
>      volumes: 1/1 healthy
>      pools:   14 pools, 3945 pgs
>      objects: 1.14G objects, 1.1 PiB
>      usage:   1.8 PiB used, 1.2 PiB / 3.0 PiB avail
>      pgs:     2/6294904971 objects degraded (0.000%)
>               2980455/6294904971 objects misplaced (0.047%)
>               3901 active+clean
>               22   active+clean+scrubbing+deep
>               17   active+remapped+backfilling
>               4    active+clean+scrubbing
>               1    active+recovering+degraded+remapped
>
>    io:
>      client:   167 MiB/s rd, 13 MiB/s wr, 6.02k op/s rd, 2.35k op/s wr
>
>
> ceph health detail (only removed information about no running scrub and
> deep_scrub)
> ---
> HEALTH_WARN Degraded data redundancy: 2/6294902067 objects degraded
> (0.000%), 1 pg degraded
> [WRN] PG_DEGRADED: Degraded data redundancy: 2/6294902067 objects
> degraded (0.000%), 1 pg degraded
>      pg 404.6e7 is active+recovering+degraded+remapped, acting
> [223,274,243,290,286,283]
>
>
> ceph pg 202.6e7 list_unfound
> ---
> {
>      "num_missing": 2,
>      "num_unfound": 0,
>      "objects": [],
>      "state": "Active",
>      "available_might_have_unfound": true,
>      "might_have_unfound": [],
>      "more": false
> }
>
> ceph pg 404.6e7 query | jq .recovery_state
> ---
> [
>    {
>      "name": "Started/Primary/Active",
>      "enter_time": "2024-01-26T09:08:41.918637+0000",
>      "might_have_unfound": [
>        {
>          "osd": "243(2)",
>          "status": "already probed"
>        },
>        {
>          "osd": "274(1)",
>          "status": "already probed"
>        },
>        {
>          "osd": "275(0)",
>          "status": "already probed"
>        },
>        {
>          "osd": "283(5)",
>          "status": "already probed"
>        },
>        {
>          "osd": "286(4)",
>          "status": "already probed"
>        },
>        {
>          "osd": "290(3)",
>          "status": "already probed"
>        },
>        {
>          "osd": "335(3)",
>          "status": "already probed"
>        }
>      ],
>      "recovery_progress": {
>        "backfill_targets": [
>          "275(0)",
>          "335(3)"
>        ],
>        "waiting_on_backfill": [],
>        "last_backfill_started":
>
> "404:e76011a9:::1f244892-a2e7-406b-aa62-1b13511333a2.625411.18_56463c71-286c-4399-8d5d-0c278b7c97fd:head",
>        "backfill_info": {
>          "begin": "MIN",
>          "end": "MIN",
>          "objects": []
>        },
>        "peer_backfill_info": [],
>        "backfills_in_flight": [],
>        "recovering": [],
>        "pg_backend": {
>          "recovery_ops": [],
>          "read_ops": []
>        }
>      }
>    },
>    {
>      "name": "Started",
>      "enter_time": "2024-01-26T09:08:40.909151+0000"
>    }
> ]
>
>
> ceph pg ls recovering backfilling
> ---
> PG       OBJECTS  DEGRADED  MISPLACED  UNFOUND  BYTES
> OMAP_BYTES*  OMAP_KEYS*  LOG    LOG_DUPS  STATE
>        SINCE  VERSION          REPORTED          UP
>        ACTING
> 404.bc    287986         0          0        0  512046716673
> 0           0  10091         0           active+recovering+remapped
> 2h   217988'1385478   217988:10897565  [193,297,279,276,136,197]p193
> [223,297,269,276,136,197]p223
> 404.c4    288236         0     288236        0  511669837559
> 0           0  10063         0          active+remapped+backfilling
> 24h   217988'1378228   217988:11719855  [156,186,178,345,339,177]p156
> [223,186,178,345,339,177]p223
> 404.12a   287544         0          0        0  512246100354
> 0           0  10009         0          active+remapped+backfilling
> 24h   217988'1392371   217988:13739524  [248,178,250,145,304,272]p248
> [223,178,250,145,304,272]p223
> 404.1c1   287739         0     286969        0  511800674008
> 0           0  10047         0          active+remapped+backfilling
> 2d   217988'1402889   217988:10975174  [332,246,183,169,280,255]p332
> [318,246,183,169,280,255]p318
> 404.258   287737         0     277111        0  510099501390
> 0           0  10077         0          active+remapped+backfilling
> 24h   217988'1451778   217988:12780104  [308,199,134,342,188,221]p308
> [318,199,134,342,188,221]p318
> 404.269   287990         0          0        0  512343190608
> 0           0  10043         0          active+remapped+backfilling
> 24h   217988'1358446   217988:14020217  [275,205,283,247,211,292]p275
> [223,205,283,247,211,292]p223
> 404.34e   287624         0     277899        0  510447074297
> 0           0  10002         0          active+remapped+backfilling
> 24h   217988'1392933   217988:12636557  [322,141,338,168,251,218]p322
> [318,141,338,168,251,218]p318
> 404.39c   287844         0     286692        0  512947685682
> 0           0  10017         0          active+remapped+backfilling
> 2d   217988'1414697   217988:11004944  [288,188,131,299,295,181]p288
> [318,188,131,299,295,181]p318
> 404.511   287589         0          0        0  512014863711
> 0           0  10057         0          active+remapped+backfilling
> 24h   217988'1368741   217988:11544729  [166,151,327,333,186,150]p166
> [223,151,327,333,186,150]p223
> 404.5f1   288126         0     286621        0  510850256945
> 0           0  10071         0          active+remapped+backfilling
> 24h   217988'1365831   217988:10348125  [214,332,289,184,255,160]p214
> [223,332,289,184,255,160]p223
> 404.62a   288035         0          0        0  511318662269
> 0           0  10014         0          active+remapped+backfilling
> 3h   217988'1358010   217988:12528704  [322,260,259,319,149,152]p322
> [318,260,259,319,149,152]p318
> 404.63d   287372         0     286559        0  508783837699
> 0           0  10074         0          active+remapped+backfilling
> 24h   217988'1402174   217988:11685744  [303,307,186,350,161,267]p303
> [318,307,186,350,161,267]p318
> 404.6e3   288110         0          0        0  509047569016
> 0           0  10049         0          active+remapped+backfilling
> 24h   217988'1368547   217988:12202278  [166,317,233,144,337,240]p166
> [223,317,233,144,337,240]p223
> 404.6e7   287856         2          2        0  510383394904
> 0           0  10047         0  active+recovering+degraded+remapped
> 3h   217988'1356501   217988:13157749  [275,274,243,335,286,283]p275
> [223,274,243,290,286,283]p223
> 404.7d2   287619         0     286026        0  510708533087
> 0           0  10093         0          active+remapped+backfilling
> 3d   217988'1397393   217988:12146656  [185,139,299,222,155,149]p185
> [223,139,299,222,155,149]p223
> 412.119   711468         0          0        0  207473602580
> 0           0  10099         0          active+remapped+backfilling
> 24h  217988'21613330   217988:87589096  [352,207,292,314,230,262]p352
> [318,207,292,314,230,262]p318
> 412.12f   711529         0     701279        0  208498170310
> 0           0  10033         0          active+remapped+backfilling
> 24h  217988'14873593   217988:86198113  [303,305,183,215,130,244]p303
> [318,305,183,215,130,244]p318
> 412.1fb   713044         0       3166        0  207787641403
> 0           0  10097         0          active+remapped+backfilling
> 2d  217988'14893270  217988:105346132  [156,137,228,241,262,353]p156
> [223,137,228,241,262,353]p223
>
>
> ceph osd tree out
> ---
> ID   CLASS  WEIGHT      TYPE NAME             STATUS  REWEIGHT  PRI-AFF
>   -1         3112.43481  root default
> -67          192.35847      host ceph-hd-001
> 269    hdd    12.82390          osd.269           up         0  1.00000
> -49          192.35847      host ceph-hd-003
> 223    hdd    12.82390          osd.223           up         0  1.00000
> -73          192.35847      host ceph-hd-011
> 290    hdd    12.82390          osd.290           up         0  1.00000
> -79          192.35847      host ceph-hd-014
> 318    hdd    12.82390          osd.318           up         0
> 1.00000_______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx