Hi, Our Ceph cluster is reporting several PGs that have not been scrubbed or deep scrubbed in time. It is over a week for these PGs to have been scrubbed. When I checked the `ceph health detail`, there are 29 pgs not deep-scrubbed in time and 22 pgs not scrubbed in time. I tried to manually start a scrub on the PGs, but it appears that they are actually in an unclean state that needs to be resolved first. This is a cluster running: ceph version 15.2.1 (9fd2f65f91d9246fae2c841a6222d34d121680ee) octopus (stable) Following the information at [Troubleshooting PGs](https://docs.ceph.com/en/latest/rados/troubleshooting/troubleshooting-pg/), I checked for PGs that are stuck stale | inactive | unclean. There were no PGs that are stale or inactive, but there are several that are stuck unclean: ``` PG_STAT STATE UP UP_PRIMARY ACTING ACTING_PRIMARY 8.3c active+remapped+backfill_wait [124,41,108,8,87,16,79,157,49] 124 [139,57,16,125,154,65,109,86,45] 139 8.3e active+remapped+backfill_wait [108,2,58,146,130,29,37,66,118] 108 [127,92,24,50,33,6,130,66,149] 127 8.3f active+remapped+backfill_wait [19,34,86,132,59,78,153,99,6] 19 [90,45,147,4,105,61,30,66,125] 90 8.40 active+remapped+backfill_wait [19,131,80,76,42,101,61,3,144] 19 [28,106,132,3,151,36,65,60,83] 28 8.3a active+remapped+backfilling [32,72,151,30,103,131,62,84,120] 32 [91,60,7,133,101,117,78,20,158] 91 8.7e active+remapped+backfill_wait [108,2,58,146,130,29,37,66,118] 108 [127,92,24,50,33,6,130,66,149] 127 8.3b active+remapped+backfill_wait [34,113,148,63,18,95,70,129,13] 34 [66,17,132,90,14,52,101,47,115] 66 8.7f active+remapped+backfill_wait [19,34,86,132,59,78,153,99,6] 19 [90,45,147,4,105,61,30,66,125] 90 8.78 active+remapped+backfill_wait [96,113,159,63,29,133,73,8,89] 96 [138,121,15,103,55,41,146,69,18] 138 8.7d active+remapped+backfilling [0,90,60,124,159,19,71,101,135] 0 [150,72,124,129,63,10,94,29,41] 150 8.7c active+remapped+backfill_wait [124,41,108,8,87,16,79,157,49] 124 [139,57,16,125,154,65,109,86,45] 139 8.79 active+remapped+backfill_wait [59,15,41,82,131,20,73,156,113] 59 [13,51,120,102,29,149,42,79,132] 13 ``` If I query one of the PGs that is backfilling, 8.3a, it shows it's state as : "recovery_state": [ { "name": "Started/Primary/Active", "enter_time": "2020-09-19T20:45:44.027759+0000", "might_have_unfound": [], "recovery_progress": { "backfill_targets": [ "30(3)", "32(0)", "62(6)", "72(1)", "84(7)", "103(4)", "120(8)", "131(5)", "151(2)" ], Q1: Is there anything that I should check/fix to enable the PGs to resolve from the `unclean` state? Q2: I have also seen that the podman containers on one of our OSD servers are taking large amounts of disk space. Is there a way to limit the growth of disk space for podman containers, when administering a Ceph cluster using `cephadm` tools? At last check, a server running 16 OSDs and 1 MON is using 39G of disk space for its running containers. Can restarting containers help to start with a fresh slate or reduce the disk use? Thanks, Matt ------------------------ Matt Larson Associate Scientist Computer Scientist/System Administrator UW-Madison Cryo-EM Research Center 433 Babcock Drive, Madison, WI 53706 _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx