Re: 18.2.2 dashboard really messed up.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Thanks!  Oddly, all the dashboard checks you suggest appear normal, yet the result remains broken.

Before I used your instruction about the dashboard, I have this result:

root@noc3:~# ceph dashboard get-prometheus-api-host
http://noc3.1.quietfountain.com:9095
root@noc3:~# netstat -6nlp | grep 9095
tcp6       0      0 :::9095                :::*                    LISTEN      80963/prometheus
root@noc3:~#

To check it, I tried setting it to something random, the browser aimed at the dashboard site reported no connection.  The error message ended when I restored the above.  But the graphs remain empty, the numbers 1 and 0.5 on each.

Regarding the used storage, notice the overall usage is 43.6 of 111 TiB.    Seems quite a distance from the trigger warning points of 85 and 95?  The default values are in use.  All the OSDs are between 37% to 42% usage.   What am I missing?

Thanks!



On 3/12/24 02:07, Nizamudeen A wrote:
Hi,

The warning and danger indicator in the capacity chart points to the nearful and full ratio set to the cluster and the default values for them are 85% and 95% respectively. You can do a `ceph osd dump | grep ratio` and see those.

When this got introduced, there was a blog post <https://ceph.io/en/news/blog/2023/landing-page/#capacity-card>explaining how this is mapped in the chart. But when your used storage crosses that 85% mark, the chart is colored with yellow to indicate the user, and when it crosses 95% (or the full ratio) the chart is colored with red to tell that. But that doesn't mean the cluster is in bad shape but its a visual indicator to tell you
you are running out of storage.

Regarding the Cluster Utilization chart, it gets metrics directly from prometheus so that it can be used to show a time-series data in UI rather than the metrics at current point in time (which was used before). So if you have prometheus configured in dashboard and its url is provided in the dashboard settings `ceph dashboard set-prometheus-api-host <url-of-prometheus>`
then you should be able to see the metrics.

In case you need to read more about the new page you can check here <https://docs.ceph.com/en/latest/mgr/dashboard/#overview-of-the-dashboard-landing-page>.

Regards,
Nizam



On Mon, Mar 11, 2024 at 11:47 PM Harry G Coin <hgcoin@xxxxxxxxx> wrote:

    Looking at ceph -s, all is well.  Looking at the dashboard, 85% of my
    capacity is 'warned', and 95% is 'in danger'.   There is no hint
    given
    as to the nature of the danger or reason for the warning. Though
    apparently with merely 5% of my ceph world 'normal', the cluster
    reports
    'ok'.  Which, you know, seems contradictory.  I've used just under
    40%
    of capacity.

    Further down the dashboard, all the subsections of 'Cluster
    Utilization'
    are '1' and '0.5' with nothing whatever in the graphics area.

    Previous versions of ceph presented a normal dashboard.

    It's just a little half rack, 5 hosts, a few physical drives each,
    been
    running ceph for a couple years now.  Orchestrator is cephadm.  It's
    just about as 'plain vanilla' at it gets.  I've had to mute one
    alert,
    because cephadm refresh aborts when it finds drives on any host that
    have nothing to do with ceph that don't have a blkid_ip 'TYPE' key.
    Seems unrelated to a totally messed up dashboard.  (The tracker
    for that
    is here: https://tracker.ceph.com/issues/63502 ).

    Any idea what the steps are to get useful stuff back on the
    dashboard?
    Any idea where I can learn what my 85% danger and 95% warning is
    'about'?  (You'd think 'danger' (The volcano is blowing up now!) 
    would
    be worse than 'warning' (the volcano might blow up soon) , so how can
    warning+danger > 100%, or if not additive how can warning < danger?)

      Here's a bit of detail:

    root@noc1:~# ceph -s
      cluster:
        id:     4067126d-01cb-40af-824a-881c130140f8
        health: HEALTH_OK
                (muted: CEPHADM_REFRESH_FAILED)

      services:
        mon: 5 daemons, quorum noc4,noc2,noc1,noc3,sysmon1 (age 70m)
        mgr: noc2.yhyuxd(active, since 82m), standbys: noc4.tvhgac,
    noc3.sybsfb, noc1.jtteqg
        mds: 1/1 daemons up, 3 standby
        osd: 27 osds: 27 up (since 20m), 27 in (since 2d)

      data:
        volumes: 1/1 healthy
        pools:   16 pools, 1809 pgs
        objects: 12.29M objects, 17 TiB
        usage:   44 TiB used, 67 TiB / 111 TiB avail
        pgs:     1793 active+clean
                 9    active+clean+scrubbing
                 7    active+clean+scrubbing+deep

      io:
        client:   5.6 MiB/s rd, 273 KiB/s wr, 41 op/s rd, 58 op/s wr

    _______________________________________________
    ceph-users mailing list -- ceph-users@xxxxxxx
    To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux