MGR failures and pg autoscaler

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,
Since some weeks we started to us pg autoscale on our pools.
We run with version 16.2.7.
Maybe a coincidence, maybe not,  from some weeks we started to experience mgr progress module failures:

“””
[root@naret-monitor01 ~]# ceph -s
  cluster:
    id:     63334166-d991-11eb-99de-40a6b72108d0
    health: HEALTH_ERR
            Module 'progress' has failed: ('346ee7e0-35f0-4fdf-960e-a36e7e2441e4',)
            1 pool(s) full  services:
    mon: 3 daemons, quorum naret-monitor01,naret-monitor02,naret-monitor03 (age 5d)
    mgr: naret-monitor02.ciqvgv(active, since 6d), standbys: naret-monitor03.escwyg, naret-monitor01.suwugf
    mds: 1/1 daemons up, 2 standby
    osd: 760 osds: 760 up (since 4d), 760 in (since 4d); 10 remapped pgs
    rgw: 3 daemons active (3 hosts, 1 zones)  data:
    volumes: 1/1 healthy
    pools:   32 pools, 6250 pgs
    objects: 977.79M objects, 3.6 PiB
    usage:   5.7 PiB used, 5.1 PiB / 11 PiB avail
    pgs:     4602612/5990777501 objects misplaced (0.077%)
             6214 active+clean
             25   active+clean+scrubbing+deep
             10   active+remapped+backfilling
             1    active+clean+scrubbing  io:
    client:   243 MiB/s rd, 292 MiB/s wr, 1.68k op/s rd, 842 op/s wr
    recovery: 430 MiB/s, 109 objects/s  progress:
    Global Recovery Event (14h)
      [===========================.] (remaining: 70s)
“””

In the mgr logs I see:
“””

debug 2022-10-20T23:09:03.859+0000 7fba5f300700  0 [pg_autoscaler ERROR root] pool 2 has overlapping roots: {-60, -1}

debug 2022-10-20T23:09:03.863+0000 7fba5f300700  0 [pg_autoscaler ERROR root] pool 3 has overlapping roots: {-60, -1, -2}

debug 2022-10-20T23:09:03.866+0000 7fba5f300700  0 [pg_autoscaler ERROR root] pool 5 has overlapping roots: {-60, -1, -2}

debug 2022-10-20T23:09:03.870+0000 7fba5f300700  0 [pg_autoscaler ERROR root] pool 6 has overlapping roots: {-60, -1, -2}

debug 2022-10-20T23:09:03.873+0000 7fba5f300700  0 [pg_autoscaler ERROR root] pool 10 has overlapping roots: {-105, -60,

-1, -2}

debug 2022-10-20T23:09:03.877+0000 7fba5f300700  0 [pg_autoscaler ERROR root] pool 11 has overlapping roots: {-105, -60,

-1, -2}

debug 2022-10-20T23:09:03.880+0000 7fba5f300700  0 [pg_autoscaler ERROR root] pool 12 has overlapping roots: {-105, -60,

-1, -2}

debug 2022-10-20T23:09:03.884+0000 7fba5f300700  0 [pg_autoscaler ERROR root] pool 13 has overlapping roots: {-105, -60,

-1, -2}

debug 2022-10-20T23:09:03.887+0000 7fba5f300700  0 [pg_autoscaler ERROR root] pool 14 has overlapping roots: {-105, -60,

-1, -2}

debug 2022-10-20T23:09:03.891+0000 7fba5f300700  0 [pg_autoscaler ERROR root] pool 15 has overlapping roots: {-105, -60,

-1, -2}

debug 2022-10-20T23:09:03.894+0000 7fba5f300700  0 [pg_autoscaler ERROR root] pool 26 has overlapping roots: {-105, -60,

-1, -2}

debug 2022-10-20T23:09:03.898+0000 7fba5f300700  0 [pg_autoscaler ERROR root] pool 28 has overlapping roots: {-105, -60, -1, -2}

debug 2022-10-20T23:09:03.901+0000 7fba5f300700  0 [pg_autoscaler ERROR root] pool 29 has overlapping roots: {-105, -60, -1, -2}

debug 2022-10-20T23:09:03.905+0000 7fba5f300700  0 [pg_autoscaler ERROR root] pool 30 has overlapping roots: {-105, -60, -1, -2}

...
“””
Do you have any explanation/fix for this errors?
Regards,

Giuseppe

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux