Re: MGR failures and pg autoscaler

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Lo,

2022年10月25日(火) 18:01 Lo Re Giuseppe <giuseppe.lore@xxxxxxx>:
>
> I have found the logs showing the progress module failure:
>
> debug 2022-10-25T05:06:08.877+0000 7f40868e7700  0 [rbd_support INFO root] execute_trash_remove: task={"sequence": 150, "id": "fcc864a0-9bde-4512-9f84-be10976613db", "message": "Removing i
> mage fulen-hdd/f3f237d2f7e304 from trash", "refs": {"action": "trash remove", "pool_name": "fulen-hdd", "pool_namespace": "", "image_id": "f3f237d2f7e304"}, "in_progress": true, "progress"
> : 0.0}
> debug 2022-10-25T05:06:08.884+0000 7f4106e90700 -1 log_channel(cluster) log [ERR] : Unhandled exception from module 'progress' while running on mgr.naret-monitor03.escwyg: ('42efb95d-ceaa-4a91-a9b2-b91f65f1834d',)
> debug 2022-10-25T05:06:08.884+0000 7f4106e90700 -1 progress.serve:
> debug 2022-10-25T05:06:08.897+0000 7f4139e96700  0 log_channel(audit) log [DBG] : from='client.22182342 -' entity='client.combin' cmd=[{"format":"json","group_name":"combin","prefix":"fs subvolume info","sub_name":"combin-4b53e28d-2f59-11ed-8aa5-9aa9e2c5aae2","vol_name":"cephfs"}]: dispatch
> debug 2022-10-25T05:06:08.884+0000 7f4106e90700 -1 Traceback (most recent call last):
>   File "/usr/share/ceph/mgr/progress/module.py", line 716, in serve
>     self._process_pg_summary()
>   File "/usr/share/ceph/mgr/progress/module.py", line 629, in _process_pg_summary
>     ev = self._events[ev_id]
> KeyError: '42efb95d-ceaa-4a91-a9b2-b91f65f1834d'

I encountered a similar problem and reported this to this ML.

https://lists.ceph.io/hyperkitty/list/ceph-users@xxxxxxx/thread/Q7A3TM6Z3XMRJPRBSHWGGACR653ICWXT/

I guess that you have multiple CRUSH rules and at least one pool uses
default root.
I'm not sure the detail of your question but hope this information help you.

Thanks,
Satoru

>
>
>
>
> On 25.10.22, 09:58, "Lo Re  Giuseppe" <giuseppe.lore@xxxxxxx> wrote:
>
>     Hi,
>     Since some weeks we started to us pg autoscale on our pools.
>     We run with version 16.2.7.
>     Maybe a coincidence, maybe not,  from some weeks we started to experience mgr progress module failures:
>
>     “””
>     [root@naret-monitor01 ~]# ceph -s
>       cluster:
>         id:     63334166-d991-11eb-99de-40a6b72108d0
>         health: HEALTH_ERR
>                 Module 'progress' has failed: ('346ee7e0-35f0-4fdf-960e-a36e7e2441e4',)
>                 1 pool(s) full  services:
>         mon: 3 daemons, quorum naret-monitor01,naret-monitor02,naret-monitor03 (age 5d)
>         mgr: naret-monitor02.ciqvgv(active, since 6d), standbys: naret-monitor03.escwyg, naret-monitor01.suwugf
>         mds: 1/1 daemons up, 2 standby
>         osd: 760 osds: 760 up (since 4d), 760 in (since 4d); 10 remapped pgs
>         rgw: 3 daemons active (3 hosts, 1 zones)  data:
>         volumes: 1/1 healthy
>         pools:   32 pools, 6250 pgs
>         objects: 977.79M objects, 3.6 PiB
>         usage:   5.7 PiB used, 5.1 PiB / 11 PiB avail
>         pgs:     4602612/5990777501 objects misplaced (0.077%)
>                  6214 active+clean
>                  25   active+clean+scrubbing+deep
>                  10   active+remapped+backfilling
>                  1    active+clean+scrubbing  io:
>         client:   243 MiB/s rd, 292 MiB/s wr, 1.68k op/s rd, 842 op/s wr
>         recovery: 430 MiB/s, 109 objects/s  progress:
>         Global Recovery Event (14h)
>           [===========================.] (remaining: 70s)
>     “””
>
>     In the mgr logs I see:
>     “””
>
>     debug 2022-10-20T23:09:03.859+0000 7fba5f300700  0 [pg_autoscaler ERROR root] pool 2 has overlapping roots: {-60, -1}
>
>     debug 2022-10-20T23:09:03.863+0000 7fba5f300700  0 [pg_autoscaler ERROR root] pool 3 has overlapping roots: {-60, -1, -2}
>
>     debug 2022-10-20T23:09:03.866+0000 7fba5f300700  0 [pg_autoscaler ERROR root] pool 5 has overlapping roots: {-60, -1, -2}
>
>     debug 2022-10-20T23:09:03.870+0000 7fba5f300700  0 [pg_autoscaler ERROR root] pool 6 has overlapping roots: {-60, -1, -2}
>
>     debug 2022-10-20T23:09:03.873+0000 7fba5f300700  0 [pg_autoscaler ERROR root] pool 10 has overlapping roots: {-105, -60,
>
>     -1, -2}
>
>     debug 2022-10-20T23:09:03.877+0000 7fba5f300700  0 [pg_autoscaler ERROR root] pool 11 has overlapping roots: {-105, -60,
>
>     -1, -2}
>
>     debug 2022-10-20T23:09:03.880+0000 7fba5f300700  0 [pg_autoscaler ERROR root] pool 12 has overlapping roots: {-105, -60,
>
>     -1, -2}
>
>     debug 2022-10-20T23:09:03.884+0000 7fba5f300700  0 [pg_autoscaler ERROR root] pool 13 has overlapping roots: {-105, -60,
>
>     -1, -2}
>
>     debug 2022-10-20T23:09:03.887+0000 7fba5f300700  0 [pg_autoscaler ERROR root] pool 14 has overlapping roots: {-105, -60,
>
>     -1, -2}
>
>     debug 2022-10-20T23:09:03.891+0000 7fba5f300700  0 [pg_autoscaler ERROR root] pool 15 has overlapping roots: {-105, -60,
>
>     -1, -2}
>
>     debug 2022-10-20T23:09:03.894+0000 7fba5f300700  0 [pg_autoscaler ERROR root] pool 26 has overlapping roots: {-105, -60,
>
>     -1, -2}
>
>     debug 2022-10-20T23:09:03.898+0000 7fba5f300700  0 [pg_autoscaler ERROR root] pool 28 has overlapping roots: {-105, -60, -1, -2}
>
>     debug 2022-10-20T23:09:03.901+0000 7fba5f300700  0 [pg_autoscaler ERROR root] pool 29 has overlapping roots: {-105, -60, -1, -2}
>
>     debug 2022-10-20T23:09:03.905+0000 7fba5f300700  0 [pg_autoscaler ERROR root] pool 30 has overlapping roots: {-105, -60, -1, -2}
>
>     ...
>     “””
>     Do you have any explanation/fix for this errors?
>     Regards,
>
>     Giuseppe
>
>     _______________________________________________
>     ceph-users mailing list -- ceph-users@xxxxxxx
>     To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux