Weird monitor and mgr behavior after update.

Cassiano Pilipavicius <cpilipav@xxxxxxxxx> · Sun, 15 Mar 2020 06:31:52 -0300

Hello, I have updated my cluster from luminous to nautilus and now  my
cluster is working but I am seeing a weird behavior in my monitors and
managers.

The monitors are using a huge amount of memory and becoming  very slow. The
CPU usage is also very higher than it used to be.

The manager keeps constantly restarting and in many opportunities it report
all zero values in the pg numbers, space usage  etc.

I also started receiving many slow ops warning from monitors.

Does any one has seen something like this?

Which info do I need to send about my cluster so I can get some help from
more experienced users?

This is one fo the weird log messages:
2020-03-15 06:26:39.474 7f0396cae700 -1 mon.cwvh2@0(leader) e27
get_health_metrics reporting 603796 slow ops, oldest is pool_op(create
unmanaged snap pool 5 tid 1236847 name  v382060)

  cluster:
    id:     4ed305d4-5847-4e63-b9bb-168361cf2e81
    health: HEALTH_WARN
            1/3 mons down, quorum cwvh8,cwvh15

  services:
    mon: 3 daemons, quorum  (age 4M), out of quorum: cwvh2, cwvh8, cwvh15
    mgr: cwvh13(active, since 33s), standbys: cwvh14
    osd: 100 osds: 100 up (since 17h), 100 in

  data:
    pools:   6 pools, 4160 pgs
    objects: 19.35M objects, 64 TiB
    usage:   132 TiB used, 94 TiB / 226 TiB avail
    pgs:     4154 active+clean
             6    active+clean+scrubbing+deep

  io:
    client:   192 MiB/s rd, 52 MiB/s wr, 1.25k op/s rd, 1.34k op/s wr

[root@cwvh15 ~]# ceph df
RAW STORAGE:
    CLASS      SIZE        AVAIL       USED        RAW USED     %RAW USED
    backup     120 TiB      62 TiB      58 TiB       58 TiB         48.64
    hdd         12 TiB     4.3 TiB     7.8 TiB      7.8 TiB         64.47
    ssd         94 TiB      28 TiB      66 TiB       66 TiB         70.40
    TOTAL      226 TiB      94 TiB     132 TiB      132 TiB         58.51

POOLS:
    POOL       ID     STORED      OBJECTS     USED        %USED     MAX
AVAIL
    rbd         0        19 B           2     128 KiB         0        11
TiB
    CWVHDS      1     420 GiB     110.03k     848 GiB     39.36       653
GiB
    COMP        4     2.7 TiB     717.72k     5.5 TiB     81.19       653
GiB
    SSD         5      35 TiB       9.43M      66 TiB     88.08       4.4
TiB
    HDD         6     730 GiB     196.24k     1.4 TiB     52.84       653
GiB
    BKPR1       8      33 TiB       8.90M      58 TiB     56.73        22
TiB
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx