Re: 18.2.2 dashboard really messed up.

Nizamudeen A <nia@xxxxxxxxxx> · Thu, 14 Mar 2024 14:20:09 +0530

Yup, that does look like a huge difference.

@Pedro Gonzalez Gomez <pegonzal@xxxxxxxxxx> @Aashish Sharma
<aasharma@xxxxxxxxxx> @Ankush Behl <anbehl@xxxxxxxxxx>  Could you guys help
here? Did we miss any fixes for 18.2.2?

Regards,

On Thu, Mar 14, 2024 at 2:17 AM Harry G Coin <hgcoin@xxxxxxxxx> wrote:

> Thanks!  Oddly, all the dashboard checks you suggest appear normal, yet
> the result remains broken.
>
> Before I used your instruction about the dashboard, I have this result:
>
> root@noc3:~# ceph dashboard get-prometheus-api-host
> http://noc3.1.quietfountain.com:9095
> root@noc3:~# netstat -6nlp | grep 9095
> tcp6       0      0 :::9095                 :::*
>                    LISTEN      80963/prometheus
> root@noc3:~#
>
> To check it, I tried setting it to something random, the browser aimed at
> the dashboard site reported no connection.  The error message ended when I
> restored the above.  But the graphs remain empty, the numbers 1 and 0.5 on
> each.
>
> Regarding the used storage, notice the overall usage is 43.6 of 111
> TiB.    Seems quite a distance from the trigger warning points of 85 and
> 95?  The default values are in use.  All the OSDs are between 37% to 42%
> usage.   What am I missing?
>
> Thanks!
>
>
>
> On 3/12/24 02:07, Nizamudeen A wrote:
>
> Hi,
>
> The warning and danger indicator in the capacity chart points to the
> nearful and full ratio set to the cluster and
> the default values for them are 85% and 95% respectively. You can do a
> `ceph osd dump | grep ratio` and see those.
>
> When this got introduced, there was a blog post
> <https://ceph.io/en/news/blog/2023/landing-page/#capacity-card>explaining
> how this is mapped in the chart. But when your used storage
> crosses that 85% mark, the chart is colored with yellow to indicate the
> user, and when it crosses 95% (or the full ratio) the
> chart is colored with red to tell that. But that doesn't mean the cluster
> is in bad shape but its a visual indicator to tell you
> you are running out of storage.
>
> Regarding the Cluster Utilization chart, it gets metrics directly from
> prometheus so that it can be used to show a time-series
> data in UI rather than the metrics at current point in time (which was
> used before). So if you have prometheus configured in
> dashboard and its url is provided in the dashboard settings `ceph
> dashboard set-prometheus-api-host <url-of-prometheus>`
> then you should be able to see the metrics.
>
> In case you need to read more about the new page you can check here
> <https://docs.ceph.com/en/latest/mgr/dashboard/#overview-of-the-dashboard-landing-page>
> .
>
> Regards,
> Nizam
>
>
>
> On Mon, Mar 11, 2024 at 11:47 PM Harry G Coin <hgcoin@xxxxxxxxx> wrote:
>
>> Looking at ceph -s, all is well.  Looking at the dashboard, 85% of my
>> capacity is 'warned', and 95% is 'in danger'.   There is no hint given
>> as to the nature of the danger or reason for the warning.  Though
>> apparently with merely 5% of my ceph world 'normal', the cluster reports
>> 'ok'.  Which, you know, seems contradictory.  I've used just under 40%
>> of capacity.
>>
>> Further down the dashboard, all the subsections of 'Cluster Utilization'
>> are '1' and '0.5' with nothing whatever in the graphics area.
>>
>> Previous versions of ceph presented a normal dashboard.
>>
>> It's just a little half rack, 5 hosts, a few physical drives each, been
>> running ceph for a couple years now.  Orchestrator is cephadm.  It's
>> just about as 'plain vanilla' at it gets.  I've had to mute one alert,
>> because cephadm refresh aborts when it finds drives on any host that
>> have nothing to do with ceph that don't have a blkid_ip 'TYPE' key.
>> Seems unrelated to a totally messed up dashboard.  (The tracker for that
>> is here: https://tracker.ceph.com/issues/63502 ).
>>
>> Any idea what the steps are to get useful stuff back on the dashboard?
>> Any idea where I can learn what my 85% danger and 95% warning is
>> 'about'?  (You'd think 'danger' (The volcano is blowing up now!)  would
>> be worse than 'warning' (the volcano might blow up soon) , so how can
>> warning+danger > 100%, or if not additive how can warning < danger?)
>>
>>   Here's a bit of detail:
>>
>> root@noc1:~# ceph -s
>>   cluster:
>>     id:     4067126d-01cb-40af-824a-881c130140f8
>>     health: HEALTH_OK
>>             (muted: CEPHADM_REFRESH_FAILED)
>>
>>   services:
>>     mon: 5 daemons, quorum noc4,noc2,noc1,noc3,sysmon1 (age 70m)
>>     mgr: noc2.yhyuxd(active, since 82m), standbys: noc4.tvhgac,
>> noc3.sybsfb, noc1.jtteqg
>>     mds: 1/1 daemons up, 3 standby
>>     osd: 27 osds: 27 up (since 20m), 27 in (since 2d)
>>
>>   data:
>>     volumes: 1/1 healthy
>>     pools:   16 pools, 1809 pgs
>>     objects: 12.29M objects, 17 TiB
>>     usage:   44 TiB used, 67 TiB / 111 TiB avail
>>     pgs:     1793 active+clean
>>              9    active+clean+scrubbing
>>              7    active+clean+scrubbing+deep
>>
>>   io:
>>     client:   5.6 MiB/s rd, 273 KiB/s wr, 41 op/s rd, 58 op/s wr
>>
>> _______________________________________________
>> ceph-users mailing list -- ceph-users@xxxxxxx
>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>>
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx