Dashboard issue slowing to a crawl - active ceph mgr process spiking to 600%+

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello All,

I'm hoping I can get some help with an issue in the dashboard after doing a recent bare metal ceph upgrade from
Octopus to Quincy. 

** Please note, this document references it only being an issue with the images tab shortly after this I found the same issue on another cluster that was recently upgraded from Octopus to quincy 17.2.7 within the last few months and it's affecting all tabs in the ceph dashboard it slows to a crawl until I restart or fail over the mgr both running on top of ubuntu 20.04

Everything appears to be working fine besides the Block --> images tab. It doesn't matter what node I fail over to,
reboots, reinstalling ceph-mgr-dashboard, different broswers, clients etc

It will not load the 4 RBDs I have, they appear in rbd ls, I can query them and the connection on the end appliance is
fine. The loading icons spin infinitely without any failure message. If I access the images tab and then move to
any other tab is the dashboard it will allow me to navigate but not display anything until I either restart the service
on the active mgr or fail over to another, so it works as expected until I access this one tab.


when I use any other section in the in the dashboard cpu utilization for the ceph-mgr is normal but when I
access the images tab it's spiked to as high as 600% and will stay like that until I restart the service or fail
over the active mgr

-- Active MGR before clicking Block, the OSDs spike for a second but revert to around 5%

top - 13:43:37 up 8 days, 23:09,  1 user,  load average: 8.08, 5.02, 4.37
Tasks: 695 total,   1 running, 694 sleeping,   0 stopped,   0 zombie
%Cpu(s):  7.0 us,  1.6 sy,  0.0 ni, 89.7 id,  1.2 wa,  0.0 hi,  0.5 si,  0.0 st

-----
MiB Mem : 128474.1 total,   6705.6 free,  65684.0 used,  56084.5 buff/cache
MiB Swap:  40927.0 total,  35839.3 free,   5087.7 used.  49253.0 avail Mem
    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
  14156 ceph      20   0 3420632   1.9g  13668 S  55.3   1.5 864:49.51 ceph-osd
  13762 ceph      20   0 3421384   1.8g  13432 S  51.3   1.4 960:22.12 ceph-osd
  14163 ceph      20   0 3422352   1.7g  13016 S  50.0   1.3 902:41.19 ceph-osd
  13803 ceph      20   0 3469596   1.8g  13532 S  44.7   1.4 941:55.10 ceph-osd
  13774 ceph      20   0 3427560   1.7g  13656 S  38.7   1.4 932:02.51 ceph-osd
  13801 ceph      20   0 3439796   1.7g  13448 S  37.7   1.3 981:25.55 ceph-osd
  14025 ceph      20   0 3426360   1.8g  13780 S  36.4   1.4 994:00.75 ceph-osd
   9888 nobody    20   0  126100   8696      0 S  21.2   0.0   1106:19 node_exporter
 126798 ceph      20   0 1787824 528000  39464 S   7.9   0.4   0:14.84 ceph-mgr
  13795 ceph      20   0 3420252   1.7g  13264 S   7.6   1.4 990:00.61 ceph-osd
  13781 ceph      20   0 3484476   1.9g  13248 S   6.3   1.5   1040:10 ceph-osd
  13777 ceph      20   0 3408972   1.8g  13464 S   6.0   1.5   1026:21 ceph-osd
  13797 ceph      20   0 3432068   1.6g  13932 S   6.0   1.3 950:39.35 ceph-osd
  13779 ceph      20   0 3471668   1.7g  12728 S   5.6   1.3 984:53.80 ceph-osd
  13768 ceph      20   0 3496064   1.9g  13504 S   5.3   1.5 918:37.48 ceph-osd
  13786 ceph      20   0 3422044   1.6g  13456 S   5.3   1.3 974:29.08 ceph-osd
  13788 ceph      20   0 3454184   1.9g  13048 S   5.3   1.5 980:35.78 ceph-osd
  13776 ceph      20   0 3445680   1.7g  12880 S   5.0   1.3 998:30.58 ceph-osd
  13785 ceph      20   0 3409548   1.7g  13704 S   5.0   1.3 939:37.08 ceph-osd
  14152 ceph      20   0 3465284   1.7g  13840 S   5.0   1.4 959:39.42 ceph-osd
  10339 nobody    20   0 6256048 531428  60188 S   4.6   0.4 239:37.56 prometheus
  13802 ceph      20   0 3430696   1.8g  13872 S   4.6   1.4 924:15.74 ceph-osd
  13791 ceph      20   0 3498876   1.5g  12648 S   4.3   1.2 962:58.37 ceph-osd
  13800 ceph      20   0 3455268   1.7g  12404 S   4.3   1.3   1000:41 ceph-osd
  13790 ceph      20   0 3434364   1.6g  13516 S   3.3   1.3 974:16.46 ceph-osd
  14217 ceph      20   0 3443436   1.8g  13560 S   3.3   1.4 902:54.22 ceph-osd
  13526 ceph      20   0 1012048 499628  11244 S   3.0   0.4 349:35.28 ceph-mon
  13775 ceph      20   0 3367284   1.6g  13940 S   3.0   1.3 878:38.27 ceph-osd
  13784 ceph      20   0 3380960   1.8g  12892 S   3.0   1.4 910:50.47 ceph-osd
  13789 ceph      20   0 3432876   1.6g  12464 S   2.6   1.2 922:45.15 ceph-osd
  13804 ceph      20   0 3428120   1.9g  13192 S   2.6   1.5 865:31.30 ceph-osd
  14153 ceph      20   0 3432752   1.8g  12576 S   2.3   1.4 874:27.92 ceph-osd
  14192 ceph      20   0 3412640   1.9g  13512 S   2.3   1.5 923:01.97 ceph-osd
  13796 ceph      20   0 3433016   1.8g  13164 S   2.0   1.4 982:08.21 ceph-osd
  13798 ceph      20   0 3405708   1.6g  13508 S   2.0   1.3 873:50.34 ceph-osd
  13814 ceph      20   0 4243252   1.5g  13500 S   2.0   1.2   2020:41 ceph-osd
  13985 ceph      20   0 3487848   1.6g  13100 S   2.0   1.3 942:21.96 ceph-osd
  14001 ceph      20   0 4194336   1.9g  13460 S   2.0   1.5   2143:46 ceph-osd
  14186 ceph      20   0 3441852   1.5g  13360 S   2.0   1.2 956:30.81 ceph-osd
   7257 root      20   0   82332   3480   2984 S   0.3   0.0   9:22.50 irqbalance
   7269 syslog    20   0  224344   3648   2392 S   0.3   0.0   0:11.79 rsyslogd
  16621 472       20   0  898376  79800  11080 S   0.3   0.1 146:36.08 grafana
 104366 root      20   0       0      0      0 I   0.3   0.0   1:11.30 kworker/0:2-
events
 125676 root      20   0       0      0      0 I   0.3   0.0   0:00.48 kworker/u104:7-
public-bond
 127115 root      20   0   10172   4636   3392 R   0.3   0.0   0:00.10 top
      1 root      20   0  180712  16312   5652 S   0.0   0.0   1:31.85 systemd
      2 root      20   0       0      0      0 S   0.0   0.0   0:00.15 kthreadd  
---

top - 13:44:28 up 8 days, 23:09,  1 user,  load average: 6.79, 5.11, 4.43
Tasks: 695 total,   1 running, 694 sleeping,   0 stopped,   0 zombie
%Cpu(s): 12.9 us,  2.1 sy,  0.0 ni, 83.5 id,  1.0 wa,  0.0 hi,  0.5 si,  0.0 st
MiB Mem : 128474.1 total,   6219.8 free,  66504.7 used,  55749.6 buff/cache
MiB Swap:  40927.0 total,  35837.0 free,   5090.0 used.  48435.7 avail Mem
   

 PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
 126798 ceph      20   0 1854596 573432  41484 S 482.5   0.4   1:45.70 ceph-mgr
  14156 ceph      20   0 3420632   1.9g  13668 S  54.6   1.5 865:16.58 ceph-osd
  13762 ceph      20   0 3421384   1.9g  13432 S  51.7   1.5 960:47.47 ceph-osd
  13803 ceph      20   0 3469596   1.9g  13532 S  49.3   1.5 942:18.23 ceph-osd
  14163 ceph      20   0 3422352   1.9g  13016 S  49.3   1.5 903:05.84 ceph-osd
  13795 ceph      20   0 3420252   1.8g  13264 S   7.0   1.4 990:04.41 ceph-osd
  13777 ceph      20   0 3408972   1.9g  13464 S   5.6   1.5   1026:24 ceph-osd
  13786 ceph      20   0 3422044   1.6g  13456 S   5.3   1.3 974:32.09 ceph-osd
  13797 ceph      20   0 3432068   1.6g  13932 S   5.3   1.3 950:42.02 ceph-osd
  16621 472       20   0  898376  78776  11044 S   5.3   0.1 146:36.76 grafana
  13791 ceph      20   0 3498876   1.5g  12648 S   5.0   1.2 963:00.94 ceph-osd
  14001 ceph      20   0 4194336   1.9g  13460 S   5.0   1.5   2143:47 ceph-osd
   9888 nobody    20   0  126100   8696      0 S   4.6   0.0   1106:23 node_exporter
  13768 ceph      20   0 3496064   1.9g  13504 S   4.6   1.5 918:39.85 ceph-osd
  13776 ceph      20   0 3445680   1.7g  12880 S   4.6   1.3 998:32.98 ceph-osd
  13781 ceph      20   0 3484476   1.6g  13248 S   4.6   1.3   1040:13 ceph-osd
  13785 ceph      20   0 3409548   1.7g  13704 S   4.6   1.3 939:40.08 ceph-osd
  13788 ceph      20   0 3454184   1.9g  13048 S   4.6   1.5 980:38.24 ceph-osd
  13779 ceph      20   0 3471668   1.7g  12728 S   4.3   1.3 984:56.39 ceph-osd
  13800 ceph      20   0 3455268   1.7g  12404 S   4.3   1.3   1000:44 ceph-osd
  13802 ceph      20   0 3430696   1.8g  13872 S   4.0   1.4 924:18.14 ceph-osd
  14152 ceph      20   0 3465284   1.7g  13840 S   4.0   1.4 959:41.83 ceph-osd
  13796 ceph      20   0 3433016   1.8g  13164 S   3.0   1.4 982:09.66 ceph-osd
  13784 ceph      20   0 3380960   1.8g  12892 S   2.6   1.4 910:52.06 ceph-osd
  13790 ceph      20   0 3434364   1.6g  13516 S   2.6   1.3 974:17.99 ceph-osd
  13801 ceph      20   0 3439796   1.9g  13448 S   2.6   1.5 981:42.61 ceph-osd
  14153 ceph      20   0 3432752   1.8g  12576 S   2.6   1.4 874:29.30 ceph-osd
  14186 ceph      20   0 3441852   1.6g  13360 S   2.6   1.2 956:32.32 ceph-osd
  14192 ceph      20   0 3412640   1.9g  13512 S   2.6   1.5 923:03.59 ceph-osd
  13526 ceph      20   0 1012048 496332  11208 S   2.3   0.4 349:36.89 ceph-mon
  13789 ceph      20   0 3432876   1.6g  12464 S   2.3   1.2 922:46.59 ceph-osd
  13798 ceph      20   0 3405708   1.6g  13508 S   2.3   1.3 873:51.73 ceph-osd
  14217 ceph      20   0 3443436   1.8g  13560 S   2.3   1.4 902:55.74 ceph-osd
  13774 ceph      20   0 3427560   1.9g  13656 S   2.0   1.5 932:19.26 ceph-osd
  13775 ceph      20   0 3367284   1.6g  13940 S   2.0   1.3 878:39.76 ceph-osd
  13814 ceph      20   0 4243252   1.5g  13500 S   2.0   1.2   2020:42 ceph-osd
  13985 ceph      20   0 3487848   1.6g  13100 S   2.0   1.3 942:23.39 ceph-osd
  13804 ceph      20   0 3428120   1.9g  13192 S   1.7   1.5 865:32.71 ceph-osd
  14025 ceph      20   0 3426360   1.8g  13780 S   1.7   1.5 994:17.32 ceph-osd
  10339 nobody    20   0 6256048 537184  60136 S   1.0   0.4 239:38.44 prometheus
  17547 nobody    20   0  128448   8572      0 S   1.0   0.0  31:54.99 alertmanager
 127115 root      20   0   10172   4636   3392 R   0.7   0.0   0:00.43 top

---
OS: Ubuntu 20.04
128GB of memory
Intel(R) Xeon(R) Gold 6230R CPU @ 2.10GHz
X11SPL-F

---
Ceph Versions
{
    "mon": {
        "ceph version 17.2.7 (b12291d110049b2f35e32e0de30d70e9a4c060d2) quincy 
(stable)": 3
    },
    "mgr": {
        "ceph version 17.2.7 (b12291d110049b2f35e32e0de30d70e9a4c060d2) quincy 
(stable)": 4
    },
    "osd": {
        "ceph version 17.2.7 (b12291d110049b2f35e32e0de30d70e9a4c060d2) quincy 
(stable)": 140
    },
    "mds": {
        "ceph version 17.2.7 (b12291d110049b2f35e32e0de30d70e9a4c060d2) quincy 
(stable)": 4
    },
    "ctdb": {
        "ceph version 17.2.7 (b12291d110049b2f35e32e0de30d70e9a4c060d2) quincy 
(stable)": 1
    },
    "rgw": {
        "ceph version 17.2.7 (b12291d110049b2f35e32e0de30d70e9a4c060d2) quincy 
(stable)": 2
    },
    "overall": {
        "ceph version 17.2.7 (b12291d110049b2f35e32e0de30d70e9a4c060d2) quincy 
(stable)": 154
    }
}
---
  cluster:
    id:     388dda42-9dd0-4858-a978-b3dc4c3b9152
    health: HEALTH_OK
  services:
    mon:  3 daemons, quorum jarn29,jarn30,jarn31 (age 8d)
    mgr:  osd31(active, since 10m), standbys: osd30, osd29, osd32
    mds:  2/2 daemons up, 2 standby
    osd:  140 osds: 140 up (since 3d), 140 in (since 3d)
          flags noautoscale
    ctdb: 1 daemon active (1 hosts)
    rgw:  2 daemons active (2 hosts, 1 zones)
  data:
    volumes: 1/1 healthy
    pools:   19 pools, 5024 pgs
    objects: 792.78M objects, 865 TiB
    usage:   1.4 PiB used, 586 TiB / 2.0 PiB avail
    pgs:     5013 active+clean
             11   active+clean+scrubbing+deep
  io:
    client:   19 MiB/s rd, 102 MiB/s wr, 672 op/s rd, 351 op/s wr

---

I've  ran perf and included the ceph-mgr.log for the first system that is displaying the issue

Perf - https://imgur.com/a/VMh4tDf
mgr log while accessing RBD tab - https://pastebin.com/t96WCWfc
mgr logs prior to clicking RBD  -https://pastebin.com/e4dtuD3i

---

Apologies for the formatting first time posting here

If anything else is needed please let me know!
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux