Re: 18.2.2 dashboard really messed up.

Nizamudeen A <nia@xxxxxxxxxx> · Tue, 12 Mar 2024 12:37:33 +0530

Hi,

The warning and danger indicator in the capacity chart points to the
nearful and full ratio set to the cluster and
the default values for them are 85% and 95% respectively. You can do a
`ceph osd dump | grep ratio` and see those.

When this got introduced, there was a blog post
<https://ceph.io/en/news/blog/2023/landing-page/#capacity-card>explaining
how this is mapped in the chart. But when your used storage
crosses that 85% mark, the chart is colored with yellow to indicate the
user, and when it crosses 95% (or the full ratio) the
chart is colored with red to tell that. But that doesn't mean the cluster
is in bad shape but its a visual indicator to tell you
you are running out of storage.

Regarding the Cluster Utilization chart, it gets metrics directly from
prometheus so that it can be used to show a time-series
data in UI rather than the metrics at current point in time (which was used
before). So if you have prometheus configured in
dashboard and its url is provided in the dashboard settings `ceph dashboard
set-prometheus-api-host <url-of-prometheus>`
then you should be able to see the metrics.

In case you need to read more about the new page you can check here
<https://docs.ceph.com/en/latest/mgr/dashboard/#overview-of-the-dashboard-landing-page>
.

Regards,
Nizam

On Mon, Mar 11, 2024 at 11:47 PM Harry G Coin <hgcoin@xxxxxxxxx> wrote:

> Looking at ceph -s, all is well.  Looking at the dashboard, 85% of my
> capacity is 'warned', and 95% is 'in danger'.   There is no hint given
> as to the nature of the danger or reason for the warning.  Though
> apparently with merely 5% of my ceph world 'normal', the cluster reports
> 'ok'.  Which, you know, seems contradictory.  I've used just under 40%
> of capacity.
>
> Further down the dashboard, all the subsections of 'Cluster Utilization'
> are '1' and '0.5' with nothing whatever in the graphics area.
>
> Previous versions of ceph presented a normal dashboard.
>
> It's just a little half rack, 5 hosts, a few physical drives each, been
> running ceph for a couple years now.  Orchestrator is cephadm.  It's
> just about as 'plain vanilla' at it gets.  I've had to mute one alert,
> because cephadm refresh aborts when it finds drives on any host that
> have nothing to do with ceph that don't have a blkid_ip 'TYPE' key.
> Seems unrelated to a totally messed up dashboard.  (The tracker for that
> is here: https://tracker.ceph.com/issues/63502 ).
>
> Any idea what the steps are to get useful stuff back on the dashboard?
> Any idea where I can learn what my 85% danger and 95% warning is
> 'about'?  (You'd think 'danger' (The volcano is blowing up now!)  would
> be worse than 'warning' (the volcano might blow up soon) , so how can
> warning+danger > 100%, or if not additive how can warning < danger?)
>
>   Here's a bit of detail:
>
> root@noc1:~# ceph -s
>   cluster:
>     id:     4067126d-01cb-40af-824a-881c130140f8
>     health: HEALTH_OK
>             (muted: CEPHADM_REFRESH_FAILED)
>
>   services:
>     mon: 5 daemons, quorum noc4,noc2,noc1,noc3,sysmon1 (age 70m)
>     mgr: noc2.yhyuxd(active, since 82m), standbys: noc4.tvhgac,
> noc3.sybsfb, noc1.jtteqg
>     mds: 1/1 daemons up, 3 standby
>     osd: 27 osds: 27 up (since 20m), 27 in (since 2d)
>
>   data:
>     volumes: 1/1 healthy
>     pools:   16 pools, 1809 pgs
>     objects: 12.29M objects, 17 TiB
>     usage:   44 TiB used, 67 TiB / 111 TiB avail
>     pgs:     1793 active+clean
>              9    active+clean+scrubbing
>              7    active+clean+scrubbing+deep
>
>   io:
>     client:   5.6 MiB/s rd, 273 KiB/s wr, 41 op/s rd, 58 op/s wr
>
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx