Yup, that does look like a huge difference. @Pedro Gonzalez Gomez <pegonzal@xxxxxxxxxx> @Aashish Sharma <aasharma@xxxxxxxxxx> @Ankush Behl <anbehl@xxxxxxxxxx> Could you guys help here? Did we miss any fixes for 18.2.2? Regards, On Thu, Mar 14, 2024 at 2:17 AM Harry G Coin <hgcoin@xxxxxxxxx> wrote: > Thanks! Oddly, all the dashboard checks you suggest appear normal, yet > the result remains broken. > > Before I used your instruction about the dashboard, I have this result: > > root@noc3:~# ceph dashboard get-prometheus-api-host > http://noc3.1.quietfountain.com:9095 > root@noc3:~# netstat -6nlp | grep 9095 > tcp6 0 0 :::9095 :::* > LISTEN 80963/prometheus > root@noc3:~# > > To check it, I tried setting it to something random, the browser aimed at > the dashboard site reported no connection. The error message ended when I > restored the above. But the graphs remain empty, the numbers 1 and 0.5 on > each. > > Regarding the used storage, notice the overall usage is 43.6 of 111 > TiB. Seems quite a distance from the trigger warning points of 85 and > 95? The default values are in use. All the OSDs are between 37% to 42% > usage. What am I missing? > > Thanks! > > > > On 3/12/24 02:07, Nizamudeen A wrote: > > Hi, > > The warning and danger indicator in the capacity chart points to the > nearful and full ratio set to the cluster and > the default values for them are 85% and 95% respectively. You can do a > `ceph osd dump | grep ratio` and see those. > > When this got introduced, there was a blog post > <https://ceph.io/en/news/blog/2023/landing-page/#capacity-card>explaining > how this is mapped in the chart. But when your used storage > crosses that 85% mark, the chart is colored with yellow to indicate the > user, and when it crosses 95% (or the full ratio) the > chart is colored with red to tell that. But that doesn't mean the cluster > is in bad shape but its a visual indicator to tell you > you are running out of storage. > > Regarding the Cluster Utilization chart, it gets metrics directly from > prometheus so that it can be used to show a time-series > data in UI rather than the metrics at current point in time (which was > used before). So if you have prometheus configured in > dashboard and its url is provided in the dashboard settings `ceph > dashboard set-prometheus-api-host <url-of-prometheus>` > then you should be able to see the metrics. > > In case you need to read more about the new page you can check here > <https://docs.ceph.com/en/latest/mgr/dashboard/#overview-of-the-dashboard-landing-page> > . > > Regards, > Nizam > > > > On Mon, Mar 11, 2024 at 11:47 PM Harry G Coin <hgcoin@xxxxxxxxx> wrote: > >> Looking at ceph -s, all is well. Looking at the dashboard, 85% of my >> capacity is 'warned', and 95% is 'in danger'. There is no hint given >> as to the nature of the danger or reason for the warning. Though >> apparently with merely 5% of my ceph world 'normal', the cluster reports >> 'ok'. Which, you know, seems contradictory. I've used just under 40% >> of capacity. >> >> Further down the dashboard, all the subsections of 'Cluster Utilization' >> are '1' and '0.5' with nothing whatever in the graphics area. >> >> Previous versions of ceph presented a normal dashboard. >> >> It's just a little half rack, 5 hosts, a few physical drives each, been >> running ceph for a couple years now. Orchestrator is cephadm. It's >> just about as 'plain vanilla' at it gets. I've had to mute one alert, >> because cephadm refresh aborts when it finds drives on any host that >> have nothing to do with ceph that don't have a blkid_ip 'TYPE' key. >> Seems unrelated to a totally messed up dashboard. (The tracker for that >> is here: https://tracker.ceph.com/issues/63502 ). >> >> Any idea what the steps are to get useful stuff back on the dashboard? >> Any idea where I can learn what my 85% danger and 95% warning is >> 'about'? (You'd think 'danger' (The volcano is blowing up now!) would >> be worse than 'warning' (the volcano might blow up soon) , so how can >> warning+danger > 100%, or if not additive how can warning < danger?) >> >> Here's a bit of detail: >> >> root@noc1:~# ceph -s >> cluster: >> id: 4067126d-01cb-40af-824a-881c130140f8 >> health: HEALTH_OK >> (muted: CEPHADM_REFRESH_FAILED) >> >> services: >> mon: 5 daemons, quorum noc4,noc2,noc1,noc3,sysmon1 (age 70m) >> mgr: noc2.yhyuxd(active, since 82m), standbys: noc4.tvhgac, >> noc3.sybsfb, noc1.jtteqg >> mds: 1/1 daemons up, 3 standby >> osd: 27 osds: 27 up (since 20m), 27 in (since 2d) >> >> data: >> volumes: 1/1 healthy >> pools: 16 pools, 1809 pgs >> objects: 12.29M objects, 17 TiB >> usage: 44 TiB used, 67 TiB / 111 TiB avail >> pgs: 1793 active+clean >> 9 active+clean+scrubbing >> 7 active+clean+scrubbing+deep >> >> io: >> client: 5.6 MiB/s rd, 273 KiB/s wr, 41 op/s rd, 58 op/s wr >> >> _______________________________________________ >> ceph-users mailing list -- ceph-users@xxxxxxx >> To unsubscribe send an email to ceph-users-leave@xxxxxxx >> > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx