Re: Dashboard issue slowing to a crawl - active ceph mgr process spiking to 600%+

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

it's a bit much output to scan through, I'd recommend to omit all unnecessary information before pasting. Anyway, this sticks out:

2024-05-01T15:49:26.977+0000 7f85688e8700 0 [dashboard ERROR frontend.error] (https://172.20.2.30:8443/#/login): Http failure response for https://172.20.2.30:8443/api/osd/settings: 401 Unauthorized

Maybe it's just a role issue, you can change that in the dashboard or via CLI. Can you verify that a user with full access can see the contents of the RBD tab?

Regards,
Eugen

Zitat von Zachary Perry <zperry@xxxxxxxxxxxx>:

Hello All,

I'm hoping I can get some help with an issue in the dashboard after doing a recent bare metal ceph upgrade from
Octopus to Quincy.

** Please note, this document references it only being an issue with the images tab shortly after this I found the same issue on another cluster that was recently upgraded from Octopus to quincy 17.2.7 within the last few months and it's affecting all tabs in the ceph dashboard it slows to a crawl until I restart or fail over the mgr both running on top of ubuntu 20.04

Everything appears to be working fine besides the Block --> images tab. It doesn't matter what node I fail over to,
reboots, reinstalling ceph-mgr-dashboard, different broswers, clients etc

It will not load the 4 RBDs I have, they appear in rbd ls, I can query them and the connection on the end appliance is fine. The loading icons spin infinitely without any failure message. If I access the images tab and then move to any other tab is the dashboard it will allow me to navigate but not display anything until I either restart the service on the active mgr or fail over to another, so it works as expected until I access this one tab.


when I use any other section in the in the dashboard cpu utilization for the ceph-mgr is normal but when I access the images tab it's spiked to as high as 600% and will stay like that until I restart the service or fail
over the active mgr

-- Active MGR before clicking Block, the OSDs spike for a second but revert to around 5%

top - 13:43:37 up 8 days, 23:09,  1 user,  load average: 8.08, 5.02, 4.37
Tasks: 695 total,   1 running, 694 sleeping,   0 stopped,   0 zombie
%Cpu(s): 7.0 us, 1.6 sy, 0.0 ni, 89.7 id, 1.2 wa, 0.0 hi, 0.5 si, 0.0 st

-----
MiB Mem : 128474.1 total,   6705.6 free,  65684.0 used,  56084.5 buff/cache
MiB Swap:  40927.0 total,  35839.3 free,   5087.7 used.  49253.0 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 14156 ceph 20 0 3420632 1.9g 13668 S 55.3 1.5 864:49.51 ceph-osd 13762 ceph 20 0 3421384 1.8g 13432 S 51.3 1.4 960:22.12 ceph-osd 14163 ceph 20 0 3422352 1.7g 13016 S 50.0 1.3 902:41.19 ceph-osd 13803 ceph 20 0 3469596 1.8g 13532 S 44.7 1.4 941:55.10 ceph-osd 13774 ceph 20 0 3427560 1.7g 13656 S 38.7 1.4 932:02.51 ceph-osd 13801 ceph 20 0 3439796 1.7g 13448 S 37.7 1.3 981:25.55 ceph-osd 14025 ceph 20 0 3426360 1.8g 13780 S 36.4 1.4 994:00.75 ceph-osd 9888 nobody 20 0 126100 8696 0 S 21.2 0.0 1106:19 node_exporter 126798 ceph 20 0 1787824 528000 39464 S 7.9 0.4 0:14.84 ceph-mgr 13795 ceph 20 0 3420252 1.7g 13264 S 7.6 1.4 990:00.61 ceph-osd 13781 ceph 20 0 3484476 1.9g 13248 S 6.3 1.5 1040:10 ceph-osd 13777 ceph 20 0 3408972 1.8g 13464 S 6.0 1.5 1026:21 ceph-osd 13797 ceph 20 0 3432068 1.6g 13932 S 6.0 1.3 950:39.35 ceph-osd 13779 ceph 20 0 3471668 1.7g 12728 S 5.6 1.3 984:53.80 ceph-osd 13768 ceph 20 0 3496064 1.9g 13504 S 5.3 1.5 918:37.48 ceph-osd 13786 ceph 20 0 3422044 1.6g 13456 S 5.3 1.3 974:29.08 ceph-osd 13788 ceph 20 0 3454184 1.9g 13048 S 5.3 1.5 980:35.78 ceph-osd 13776 ceph 20 0 3445680 1.7g 12880 S 5.0 1.3 998:30.58 ceph-osd 13785 ceph 20 0 3409548 1.7g 13704 S 5.0 1.3 939:37.08 ceph-osd 14152 ceph 20 0 3465284 1.7g 13840 S 5.0 1.4 959:39.42 ceph-osd 10339 nobody 20 0 6256048 531428 60188 S 4.6 0.4 239:37.56 prometheus 13802 ceph 20 0 3430696 1.8g 13872 S 4.6 1.4 924:15.74 ceph-osd 13791 ceph 20 0 3498876 1.5g 12648 S 4.3 1.2 962:58.37 ceph-osd 13800 ceph 20 0 3455268 1.7g 12404 S 4.3 1.3 1000:41 ceph-osd 13790 ceph 20 0 3434364 1.6g 13516 S 3.3 1.3 974:16.46 ceph-osd 14217 ceph 20 0 3443436 1.8g 13560 S 3.3 1.4 902:54.22 ceph-osd 13526 ceph 20 0 1012048 499628 11244 S 3.0 0.4 349:35.28 ceph-mon 13775 ceph 20 0 3367284 1.6g 13940 S 3.0 1.3 878:38.27 ceph-osd 13784 ceph 20 0 3380960 1.8g 12892 S 3.0 1.4 910:50.47 ceph-osd 13789 ceph 20 0 3432876 1.6g 12464 S 2.6 1.2 922:45.15 ceph-osd 13804 ceph 20 0 3428120 1.9g 13192 S 2.6 1.5 865:31.30 ceph-osd 14153 ceph 20 0 3432752 1.8g 12576 S 2.3 1.4 874:27.92 ceph-osd 14192 ceph 20 0 3412640 1.9g 13512 S 2.3 1.5 923:01.97 ceph-osd 13796 ceph 20 0 3433016 1.8g 13164 S 2.0 1.4 982:08.21 ceph-osd 13798 ceph 20 0 3405708 1.6g 13508 S 2.0 1.3 873:50.34 ceph-osd 13814 ceph 20 0 4243252 1.5g 13500 S 2.0 1.2 2020:41 ceph-osd 13985 ceph 20 0 3487848 1.6g 13100 S 2.0 1.3 942:21.96 ceph-osd 14001 ceph 20 0 4194336 1.9g 13460 S 2.0 1.5 2143:46 ceph-osd 14186 ceph 20 0 3441852 1.5g 13360 S 2.0 1.2 956:30.81 ceph-osd 7257 root 20 0 82332 3480 2984 S 0.3 0.0 9:22.50 irqbalance 7269 syslog 20 0 224344 3648 2392 S 0.3 0.0 0:11.79 rsyslogd 16621 472 20 0 898376 79800 11080 S 0.3 0.1 146:36.08 grafana 104366 root 20 0 0 0 0 I 0.3 0.0 1:11.30 kworker/0:2-
events
125676 root 20 0 0 0 0 I 0.3 0.0 0:00.48 kworker/u104:7-
public-bond
 127115 root      20   0   10172   4636   3392 R   0.3   0.0   0:00.10 top
1 root 20 0 180712 16312 5652 S 0.0 0.0 1:31.85 systemd 2 root 20 0 0 0 0 S 0.0 0.0 0:00.15 kthreadd
---

top - 13:44:28 up 8 days, 23:09,  1 user,  load average: 6.79, 5.11, 4.43
Tasks: 695 total,   1 running, 694 sleeping,   0 stopped,   0 zombie
%Cpu(s): 12.9 us, 2.1 sy, 0.0 ni, 83.5 id, 1.0 wa, 0.0 hi, 0.5 si, 0.0 st
MiB Mem : 128474.1 total,   6219.8 free,  66504.7 used,  55749.6 buff/cache
MiB Swap:  40927.0 total,  35837.0 free,   5090.0 used.  48435.7 avail Mem


 PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
126798 ceph 20 0 1854596 573432 41484 S 482.5 0.4 1:45.70 ceph-mgr 14156 ceph 20 0 3420632 1.9g 13668 S 54.6 1.5 865:16.58 ceph-osd 13762 ceph 20 0 3421384 1.9g 13432 S 51.7 1.5 960:47.47 ceph-osd 13803 ceph 20 0 3469596 1.9g 13532 S 49.3 1.5 942:18.23 ceph-osd 14163 ceph 20 0 3422352 1.9g 13016 S 49.3 1.5 903:05.84 ceph-osd 13795 ceph 20 0 3420252 1.8g 13264 S 7.0 1.4 990:04.41 ceph-osd 13777 ceph 20 0 3408972 1.9g 13464 S 5.6 1.5 1026:24 ceph-osd 13786 ceph 20 0 3422044 1.6g 13456 S 5.3 1.3 974:32.09 ceph-osd 13797 ceph 20 0 3432068 1.6g 13932 S 5.3 1.3 950:42.02 ceph-osd 16621 472 20 0 898376 78776 11044 S 5.3 0.1 146:36.76 grafana 13791 ceph 20 0 3498876 1.5g 12648 S 5.0 1.2 963:00.94 ceph-osd 14001 ceph 20 0 4194336 1.9g 13460 S 5.0 1.5 2143:47 ceph-osd 9888 nobody 20 0 126100 8696 0 S 4.6 0.0 1106:23 node_exporter 13768 ceph 20 0 3496064 1.9g 13504 S 4.6 1.5 918:39.85 ceph-osd 13776 ceph 20 0 3445680 1.7g 12880 S 4.6 1.3 998:32.98 ceph-osd 13781 ceph 20 0 3484476 1.6g 13248 S 4.6 1.3 1040:13 ceph-osd 13785 ceph 20 0 3409548 1.7g 13704 S 4.6 1.3 939:40.08 ceph-osd 13788 ceph 20 0 3454184 1.9g 13048 S 4.6 1.5 980:38.24 ceph-osd 13779 ceph 20 0 3471668 1.7g 12728 S 4.3 1.3 984:56.39 ceph-osd 13800 ceph 20 0 3455268 1.7g 12404 S 4.3 1.3 1000:44 ceph-osd 13802 ceph 20 0 3430696 1.8g 13872 S 4.0 1.4 924:18.14 ceph-osd 14152 ceph 20 0 3465284 1.7g 13840 S 4.0 1.4 959:41.83 ceph-osd 13796 ceph 20 0 3433016 1.8g 13164 S 3.0 1.4 982:09.66 ceph-osd 13784 ceph 20 0 3380960 1.8g 12892 S 2.6 1.4 910:52.06 ceph-osd 13790 ceph 20 0 3434364 1.6g 13516 S 2.6 1.3 974:17.99 ceph-osd 13801 ceph 20 0 3439796 1.9g 13448 S 2.6 1.5 981:42.61 ceph-osd 14153 ceph 20 0 3432752 1.8g 12576 S 2.6 1.4 874:29.30 ceph-osd 14186 ceph 20 0 3441852 1.6g 13360 S 2.6 1.2 956:32.32 ceph-osd 14192 ceph 20 0 3412640 1.9g 13512 S 2.6 1.5 923:03.59 ceph-osd 13526 ceph 20 0 1012048 496332 11208 S 2.3 0.4 349:36.89 ceph-mon 13789 ceph 20 0 3432876 1.6g 12464 S 2.3 1.2 922:46.59 ceph-osd 13798 ceph 20 0 3405708 1.6g 13508 S 2.3 1.3 873:51.73 ceph-osd 14217 ceph 20 0 3443436 1.8g 13560 S 2.3 1.4 902:55.74 ceph-osd 13774 ceph 20 0 3427560 1.9g 13656 S 2.0 1.5 932:19.26 ceph-osd 13775 ceph 20 0 3367284 1.6g 13940 S 2.0 1.3 878:39.76 ceph-osd 13814 ceph 20 0 4243252 1.5g 13500 S 2.0 1.2 2020:42 ceph-osd 13985 ceph 20 0 3487848 1.6g 13100 S 2.0 1.3 942:23.39 ceph-osd 13804 ceph 20 0 3428120 1.9g 13192 S 1.7 1.5 865:32.71 ceph-osd 14025 ceph 20 0 3426360 1.8g 13780 S 1.7 1.5 994:17.32 ceph-osd 10339 nobody 20 0 6256048 537184 60136 S 1.0 0.4 239:38.44 prometheus 17547 nobody 20 0 128448 8572 0 S 1.0 0.0 31:54.99 alertmanager
 127115 root      20   0   10172   4636   3392 R   0.7   0.0   0:00.43 top

---
OS: Ubuntu 20.04
128GB of memory
Intel(R) Xeon(R) Gold 6230R CPU @ 2.10GHz
X11SPL-F

---
Ceph Versions
{
    "mon": {
"ceph version 17.2.7 (b12291d110049b2f35e32e0de30d70e9a4c060d2) quincy
(stable)": 3
    },
    "mgr": {
"ceph version 17.2.7 (b12291d110049b2f35e32e0de30d70e9a4c060d2) quincy
(stable)": 4
    },
    "osd": {
"ceph version 17.2.7 (b12291d110049b2f35e32e0de30d70e9a4c060d2) quincy
(stable)": 140
    },
    "mds": {
"ceph version 17.2.7 (b12291d110049b2f35e32e0de30d70e9a4c060d2) quincy
(stable)": 4
    },
    "ctdb": {
"ceph version 17.2.7 (b12291d110049b2f35e32e0de30d70e9a4c060d2) quincy
(stable)": 1
    },
    "rgw": {
"ceph version 17.2.7 (b12291d110049b2f35e32e0de30d70e9a4c060d2) quincy
(stable)": 2
    },
    "overall": {
"ceph version 17.2.7 (b12291d110049b2f35e32e0de30d70e9a4c060d2) quincy
(stable)": 154
    }
}
---
  cluster:
    id:     388dda42-9dd0-4858-a978-b3dc4c3b9152
    health: HEALTH_OK
  services:
    mon:  3 daemons, quorum jarn29,jarn30,jarn31 (age 8d)
    mgr:  osd31(active, since 10m), standbys: osd30, osd29, osd32
    mds:  2/2 daemons up, 2 standby
    osd:  140 osds: 140 up (since 3d), 140 in (since 3d)
          flags noautoscale
    ctdb: 1 daemon active (1 hosts)
    rgw:  2 daemons active (2 hosts, 1 zones)
  data:
    volumes: 1/1 healthy
    pools:   19 pools, 5024 pgs
    objects: 792.78M objects, 865 TiB
    usage:   1.4 PiB used, 586 TiB / 2.0 PiB avail
    pgs:     5013 active+clean
             11   active+clean+scrubbing+deep
  io:
    client:   19 MiB/s rd, 102 MiB/s wr, 672 op/s rd, 351 op/s wr

---

I've ran perf and included the ceph-mgr.log for the first system that is displaying the issue

Perf - https://imgur.com/a/VMh4tDf
mgr log while accessing RBD tab - https://pastebin.com/t96WCWfc
mgr logs prior to clicking RBD  -https://pastebin.com/e4dtuD3i

---

Apologies for the formatting first time posting here

If anything else is needed please let me know!
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx


_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux