I have found the logs showing the progress module failure: debug 2022-10-25T05:06:08.877+0000 7f40868e7700 0 [rbd_support INFO root] execute_trash_remove: task={"sequence": 150, "id": "fcc864a0-9bde-4512-9f84-be10976613db", "message": "Removing i mage fulen-hdd/f3f237d2f7e304 from trash", "refs": {"action": "trash remove", "pool_name": "fulen-hdd", "pool_namespace": "", "image_id": "f3f237d2f7e304"}, "in_progress": true, "progress" : 0.0} debug 2022-10-25T05:06:08.884+0000 7f4106e90700 -1 log_channel(cluster) log [ERR] : Unhandled exception from module 'progress' while running on mgr.naret-monitor03.escwyg: ('42efb95d-ceaa-4a91-a9b2-b91f65f1834d',) debug 2022-10-25T05:06:08.884+0000 7f4106e90700 -1 progress.serve: debug 2022-10-25T05:06:08.897+0000 7f4139e96700 0 log_channel(audit) log [DBG] : from='client.22182342 -' entity='client.combin' cmd=[{"format":"json","group_name":"combin","prefix":"fs subvolume info","sub_name":"combin-4b53e28d-2f59-11ed-8aa5-9aa9e2c5aae2","vol_name":"cephfs"}]: dispatch debug 2022-10-25T05:06:08.884+0000 7f4106e90700 -1 Traceback (most recent call last): File "/usr/share/ceph/mgr/progress/module.py", line 716, in serve self._process_pg_summary() File "/usr/share/ceph/mgr/progress/module.py", line 629, in _process_pg_summary ev = self._events[ev_id] KeyError: '42efb95d-ceaa-4a91-a9b2-b91f65f1834d' On 25.10.22, 09:58, "Lo Re Giuseppe" <giuseppe.lore@xxxxxxx> wrote: Hi, Since some weeks we started to us pg autoscale on our pools. We run with version 16.2.7. Maybe a coincidence, maybe not, from some weeks we started to experience mgr progress module failures: “”” [root@naret-monitor01 ~]# ceph -s cluster: id: 63334166-d991-11eb-99de-40a6b72108d0 health: HEALTH_ERR Module 'progress' has failed: ('346ee7e0-35f0-4fdf-960e-a36e7e2441e4',) 1 pool(s) full services: mon: 3 daemons, quorum naret-monitor01,naret-monitor02,naret-monitor03 (age 5d) mgr: naret-monitor02.ciqvgv(active, since 6d), standbys: naret-monitor03.escwyg, naret-monitor01.suwugf mds: 1/1 daemons up, 2 standby osd: 760 osds: 760 up (since 4d), 760 in (since 4d); 10 remapped pgs rgw: 3 daemons active (3 hosts, 1 zones) data: volumes: 1/1 healthy pools: 32 pools, 6250 pgs objects: 977.79M objects, 3.6 PiB usage: 5.7 PiB used, 5.1 PiB / 11 PiB avail pgs: 4602612/5990777501 objects misplaced (0.077%) 6214 active+clean 25 active+clean+scrubbing+deep 10 active+remapped+backfilling 1 active+clean+scrubbing io: client: 243 MiB/s rd, 292 MiB/s wr, 1.68k op/s rd, 842 op/s wr recovery: 430 MiB/s, 109 objects/s progress: Global Recovery Event (14h) [===========================.] (remaining: 70s) “”” In the mgr logs I see: “”” debug 2022-10-20T23:09:03.859+0000 7fba5f300700 0 [pg_autoscaler ERROR root] pool 2 has overlapping roots: {-60, -1} debug 2022-10-20T23:09:03.863+0000 7fba5f300700 0 [pg_autoscaler ERROR root] pool 3 has overlapping roots: {-60, -1, -2} debug 2022-10-20T23:09:03.866+0000 7fba5f300700 0 [pg_autoscaler ERROR root] pool 5 has overlapping roots: {-60, -1, -2} debug 2022-10-20T23:09:03.870+0000 7fba5f300700 0 [pg_autoscaler ERROR root] pool 6 has overlapping roots: {-60, -1, -2} debug 2022-10-20T23:09:03.873+0000 7fba5f300700 0 [pg_autoscaler ERROR root] pool 10 has overlapping roots: {-105, -60, -1, -2} debug 2022-10-20T23:09:03.877+0000 7fba5f300700 0 [pg_autoscaler ERROR root] pool 11 has overlapping roots: {-105, -60, -1, -2} debug 2022-10-20T23:09:03.880+0000 7fba5f300700 0 [pg_autoscaler ERROR root] pool 12 has overlapping roots: {-105, -60, -1, -2} debug 2022-10-20T23:09:03.884+0000 7fba5f300700 0 [pg_autoscaler ERROR root] pool 13 has overlapping roots: {-105, -60, -1, -2} debug 2022-10-20T23:09:03.887+0000 7fba5f300700 0 [pg_autoscaler ERROR root] pool 14 has overlapping roots: {-105, -60, -1, -2} debug 2022-10-20T23:09:03.891+0000 7fba5f300700 0 [pg_autoscaler ERROR root] pool 15 has overlapping roots: {-105, -60, -1, -2} debug 2022-10-20T23:09:03.894+0000 7fba5f300700 0 [pg_autoscaler ERROR root] pool 26 has overlapping roots: {-105, -60, -1, -2} debug 2022-10-20T23:09:03.898+0000 7fba5f300700 0 [pg_autoscaler ERROR root] pool 28 has overlapping roots: {-105, -60, -1, -2} debug 2022-10-20T23:09:03.901+0000 7fba5f300700 0 [pg_autoscaler ERROR root] pool 29 has overlapping roots: {-105, -60, -1, -2} debug 2022-10-20T23:09:03.905+0000 7fba5f300700 0 [pg_autoscaler ERROR root] pool 30 has overlapping roots: {-105, -60, -1, -2} ... “”” Do you have any explanation/fix for this errors? Regards, Giuseppe _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx