Incomplete PGs

"Hayashida, Mami" <mami.hayashida@xxxxxxx> · Mon, 12 Dec 2022 14:55:55 -0500

Our small Ceph (Nautilus) cluster experienced a series of failures over a
period of time, including losing one OSD node completely.   I have been
trying to restore the cluster since then, but running into one problem
after another.  Currently I have 19 PGs that are marked inactive +
incomplete. As I cannot find on docs what the best way to proceed, any
guidance would be appreciated.  Though far from ideal, I would take some
data loss in order to regain the operability of the cluster at this point.

```
cephadmin@cephadmin:~$ ceph health detail
HEALTH_WARN Reduced data availability: 19 pgs inactive, 19 pgs incomplete;
8 pgs not deep-scrubbed in time; 1 pgs not scrubbed in time; 731 slow ops,
oldest one blocked for 370 sec, daemons
[osd.0,osd.12,osd.13,osd.3,osd.5,osd.7,osd.8] have slow ops.
PG_AVAILABILITY Reduced data availability: 19 pgs inactive, 19 pgs
incomplete
    pg 6.0 is incomplete, acting [0,5,16]
    pg 6.2 is incomplete, acting [3,11,4]
    pg 6.4 is incomplete, acting [12,5,9]
    pg 6.5 is incomplete, acting [7,9,2]
    pg 6.6 is incomplete, acting [13,4,11]
    pg 6.7 is incomplete, acting [5,3,10]
    pg 6.8 is incomplete, acting [7,17,3]
    pg 6.a is incomplete, acting [5,0,9]
    pg 6.b is incomplete, acting [12,7,17]
    pg 6.c is incomplete, acting [12,14,17]
    pg 6.d is incomplete, acting [5,1,10]
    pg 6.f is incomplete, acting [12,16,14]
    pg 6.10 is incomplete, acting [0,8,15]
    pg 6.11 is incomplete, acting [12,11,4]
    pg 6.13 is incomplete, acting [3,11,14]
    pg 6.18 is incomplete, acting [0,7,10]
    pg 6.19 is incomplete, acting [5,1,10]
    pg 6.1d is incomplete, acting [13,8,5]
    pg 6.1e is incomplete, acting [8,5,3]
PG_NOT_DEEP_SCRUBBED 8 pgs not deep-scrubbed in time
    pg 6.a not deep-scrubbed since 2022-11-28 08:17:11.288522
    pg 6.d not deep-scrubbed since 2022-11-27 18:56:46.688560
    pg 6.0 not deep-scrubbed since 2022-11-26 14:20:28.053493
    pg 6.5 not deep-scrubbed since 2022-11-26 14:20:11.536393
    pg 6.4 not deep-scrubbed since 2022-11-27 16:13:21.598402
    pg 6.13 not deep-scrubbed since 2022-11-29 13:19:51.396276
    pg 6.11 not deep-scrubbed since 2022-11-27 13:37:21.547531
    pg 6.1e not deep-scrubbed since 2022-11-27 03:24:57.424219
PG_NOT_SCRUBBED 1 pgs not scrubbed in time
    pg 6.0 not scrubbed since 2022-12-02 00:56:14.670839
SLOW_OPS 731 slow ops, oldest one blocked for 370 sec, daemons
[osd.0,osd.12,osd.13,osd.3,osd.5,osd.7,osd.8] have slow ops
```
Output of `ceph pg 6.0 query`

```
[...]
    "recovery_state": [
        {
            "name": "Started/Primary/Peering/Incomplete",
            "enter_time": "2022-12-12 14:43:43.832236",
            "comment": "not enough complete instances of this PG"
        },
        {
            "name": "Started/Primary/Peering",
            "enter_time": "2022-12-12 14:43:42.321160",
            "past_intervals": [
                {
                    "first": "29863",
                    "last": "31886",
                    "all_participants": [

[...]

            "peering_blocked_by": [],
            "peering_blocked_by_detail": [
                {
                    "detail": "peering_blocked_by_history_les_bound"
                }
            ]
        },
        {
            "name": "Started",
            "enter_time": "2022-12-12 14:43:42.321045"
        }
    ],
    "agent_state": {}
}
```

*Mami Hayashida*
*Systems Professional III*
ITS Research Computing Infrastructure
University of Kentucky
Lexington, Kentucky, 40506 (USA)
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx