Our small Ceph (Nautilus) cluster experienced a series of failures over a
period of time, including losing one OSD node completely. I have been
trying to restore the cluster since then, but running into one problem
after another. Currently I have 19 PGs that are marked inactive +
incomplete. As I cannot find on docs what the best way to proceed, any
guidance would be appreciated. Though far from ideal, I would take some
data loss in order to regain the operability of the cluster at this point.
```
cephadmin@cephadmin:~$ ceph health detail
HEALTH_WARN Reduced data availability: 19 pgs inactive, 19 pgs incomplete;
8 pgs not deep-scrubbed in time; 1 pgs not scrubbed in time; 731 slow ops,
oldest one blocked for 370 sec, daemons
[osd.0,osd.12,osd.13,osd.3,osd.5,osd.7,osd.8] have slow ops.
PG_AVAILABILITY Reduced data availability: 19 pgs inactive, 19 pgs
incomplete
pg 6.0 is incomplete, acting [0,5,16]
pg 6.2 is incomplete, acting [3,11,4]
pg 6.4 is incomplete, acting [12,5,9]
pg 6.5 is incomplete, acting [7,9,2]
pg 6.6 is incomplete, acting [13,4,11]
pg 6.7 is incomplete, acting [5,3,10]
pg 6.8 is incomplete, acting [7,17,3]
pg 6.a is incomplete, acting [5,0,9]
pg 6.b is incomplete, acting [12,7,17]
pg 6.c is incomplete, acting [12,14,17]
pg 6.d is incomplete, acting [5,1,10]
pg 6.f is incomplete, acting [12,16,14]
pg 6.10 is incomplete, acting [0,8,15]
pg 6.11 is incomplete, acting [12,11,4]
pg 6.13 is incomplete, acting [3,11,14]
pg 6.18 is incomplete, acting [0,7,10]
pg 6.19 is incomplete, acting [5,1,10]
pg 6.1d is incomplete, acting [13,8,5]
pg 6.1e is incomplete, acting [8,5,3]
PG_NOT_DEEP_SCRUBBED 8 pgs not deep-scrubbed in time
pg 6.a not deep-scrubbed since 2022-11-28 08:17:11.288522
pg 6.d not deep-scrubbed since 2022-11-27 18:56:46.688560
pg 6.0 not deep-scrubbed since 2022-11-26 14:20:28.053493
pg 6.5 not deep-scrubbed since 2022-11-26 14:20:11.536393
pg 6.4 not deep-scrubbed since 2022-11-27 16:13:21.598402
pg 6.13 not deep-scrubbed since 2022-11-29 13:19:51.396276
pg 6.11 not deep-scrubbed since 2022-11-27 13:37:21.547531
pg 6.1e not deep-scrubbed since 2022-11-27 03:24:57.424219
PG_NOT_SCRUBBED 1 pgs not scrubbed in time
pg 6.0 not scrubbed since 2022-12-02 00:56:14.670839
SLOW_OPS 731 slow ops, oldest one blocked for 370 sec, daemons
[osd.0,osd.12,osd.13,osd.3,osd.5,osd.7,osd.8] have slow ops
```
Output of `ceph pg 6.0 query`
```
[...]
"recovery_state": [
{
"name": "Started/Primary/Peering/Incomplete",
"enter_time": "2022-12-12 14:43:43.832236",
"comment": "not enough complete instances of this PG"
},
{
"name": "Started/Primary/Peering",
"enter_time": "2022-12-12 14:43:42.321160",
"past_intervals": [
{
"first": "29863",
"last": "31886",
"all_participants": [
[...]
"peering_blocked_by": [],
"peering_blocked_by_detail": [
{
"detail": "peering_blocked_by_history_les_bound"
}
]
},
{
"name": "Started",
"enter_time": "2022-12-12 14:43:42.321045"
}
],
"agent_state": {}
}
```
*Mami Hayashida*
*Systems Professional III*
ITS Research Computing Infrastructure
University of Kentucky
Lexington, Kentucky, 40506 (USA)
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx