Our small Ceph (Nautilus) cluster experienced a series of failures over a period of time, including losing one OSD node completely. I have been trying to restore the cluster since then, but running into one problem after another. Currently I have 19 PGs that are marked inactive + incomplete. As I cannot find on docs what the best way to proceed, any guidance would be appreciated. Though far from ideal, I would take some data loss in order to regain the operability of the cluster at this point. ``` cephadmin@cephadmin:~$ ceph health detail HEALTH_WARN Reduced data availability: 19 pgs inactive, 19 pgs incomplete; 8 pgs not deep-scrubbed in time; 1 pgs not scrubbed in time; 731 slow ops, oldest one blocked for 370 sec, daemons [osd.0,osd.12,osd.13,osd.3,osd.5,osd.7,osd.8] have slow ops. PG_AVAILABILITY Reduced data availability: 19 pgs inactive, 19 pgs incomplete pg 6.0 is incomplete, acting [0,5,16] pg 6.2 is incomplete, acting [3,11,4] pg 6.4 is incomplete, acting [12,5,9] pg 6.5 is incomplete, acting [7,9,2] pg 6.6 is incomplete, acting [13,4,11] pg 6.7 is incomplete, acting [5,3,10] pg 6.8 is incomplete, acting [7,17,3] pg 6.a is incomplete, acting [5,0,9] pg 6.b is incomplete, acting [12,7,17] pg 6.c is incomplete, acting [12,14,17] pg 6.d is incomplete, acting [5,1,10] pg 6.f is incomplete, acting [12,16,14] pg 6.10 is incomplete, acting [0,8,15] pg 6.11 is incomplete, acting [12,11,4] pg 6.13 is incomplete, acting [3,11,14] pg 6.18 is incomplete, acting [0,7,10] pg 6.19 is incomplete, acting [5,1,10] pg 6.1d is incomplete, acting [13,8,5] pg 6.1e is incomplete, acting [8,5,3] PG_NOT_DEEP_SCRUBBED 8 pgs not deep-scrubbed in time pg 6.a not deep-scrubbed since 2022-11-28 08:17:11.288522 pg 6.d not deep-scrubbed since 2022-11-27 18:56:46.688560 pg 6.0 not deep-scrubbed since 2022-11-26 14:20:28.053493 pg 6.5 not deep-scrubbed since 2022-11-26 14:20:11.536393 pg 6.4 not deep-scrubbed since 2022-11-27 16:13:21.598402 pg 6.13 not deep-scrubbed since 2022-11-29 13:19:51.396276 pg 6.11 not deep-scrubbed since 2022-11-27 13:37:21.547531 pg 6.1e not deep-scrubbed since 2022-11-27 03:24:57.424219 PG_NOT_SCRUBBED 1 pgs not scrubbed in time pg 6.0 not scrubbed since 2022-12-02 00:56:14.670839 SLOW_OPS 731 slow ops, oldest one blocked for 370 sec, daemons [osd.0,osd.12,osd.13,osd.3,osd.5,osd.7,osd.8] have slow ops ``` Output of `ceph pg 6.0 query` ``` [...] "recovery_state": [ { "name": "Started/Primary/Peering/Incomplete", "enter_time": "2022-12-12 14:43:43.832236", "comment": "not enough complete instances of this PG" }, { "name": "Started/Primary/Peering", "enter_time": "2022-12-12 14:43:42.321160", "past_intervals": [ { "first": "29863", "last": "31886", "all_participants": [ [...] "peering_blocked_by": [], "peering_blocked_by_detail": [ { "detail": "peering_blocked_by_history_les_bound" } ] }, { "name": "Started", "enter_time": "2022-12-12 14:43:42.321045" } ], "agent_state": {} } ``` *Mami Hayashida* *Systems Professional III* ITS Research Computing Infrastructure University of Kentucky Lexington, Kentucky, 40506 (USA) _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx