Hi,
could you share a screenshot (in some pastebin)? I'm not sure what
exactly you're seeing. But apparently, muting works as I understand it.
I wish I knew whether mon/mgr host changes erases all the mutings.
I have just now given the command "ceph health mute OSD_UNREACHABLE
180d' -- for the second time this week, and now the board shows
green. I scrolled back through the command list to very I did
this. Indeed it was there.
I'm not aware that a MGR failover would unmute anything, sounds
unlikely to me.
Is there a command line that lists active mutings -- one that's not
used by the dashboard apparently?
You can still check 'ceph health detail' which will also show the
muted warning:
ceph:~ # ceph health detail
HEALTH_OK (muted: MON_DISK_LOW)
(MUTED) [WRN] MON_DISK_LOW: mon ceph is low on available space
mon.ceph has 40% avail
Zitat von Harry G Coin <hgcoin@xxxxxxxxx>:
Yes, all the errors and warnings list as 'suppressed'. Doesn't
affect the bug as reported below.
Of some interest, "OSD_UNREACHABLE" is not listed on the dashboard
alert roster of problems, but is in the command line health detail.
But really, when all the errors list as 'suppressed', whatever they
are, then the dashboard should show green. Instead it flashes red,
along with !Critical as detailed below.
I suspect what's really going on is the detection method for showing
the 'red / yellow / green' decision and !Critical decision, is
different than whether the length of unsilenced errors is >0. Even
allowing for the possibility that many errors exist which could
trigger HEALTH_ERR for which no entry in the roster of alerts exists.
I wish I knew whether mon/mgr host changes erases all the mutings.
I have just now given the command "ceph health mute OSD_UNREACHABLE
180d' -- for the second time this week, and now the board shows
green. I scrolled back through the command list to very I did
this. Indeed it was there. Is there a command line that lists
active mutings -- one that's not used by the dashboard apparently?
On 2/10/25 14:00, Eugen Block wrote:
Hi,
did you also mute the osd_unreachable warning?
ceph health mute OSD_UNREACHABLE 10w
Should bring the cluster back to HEALTH_OK for 10 weeks.
Zitat von Harry G Coin <hgcoin@xxxxxxxxx>:
Hi Nizam
Answers interposed below.
On 2/10/25 11:56, Nizamudeen A wrote:
Hey Harry,
Do you see that for every alert or for some of them? If some,
what are those? I just tried a couple of them locally and saw the
dashboard went to a happy state.
My sanbox/dev array has three chronic 'warnings/errors'. The
first is a PG imbalance I'm aware of. The second is that all 27
osds are unreachable. The third is that the array has been in an
error state for more than 5 minutes. Silencing/suppressing all of
them still gives the 'red flashing broken dot' on the dashboard,
the !Cluster status, notice of Alerts listing the previously
suppressed errors/warnings. Under 'observability' we see no
indications of errors/warnings under the 'alerts' menu option --
so you got that one right.
Can you tell me how the ceph health or ceph health detail looks
like after the muted alert? And also does ceph -s reports
HEALTH_OK?
root@noc1:~# ceph -s
cluster:
id: 40671....140f8
health: HEALTH_ERR
27 osds(s) are not reachable
services:
mon: 5 daemons, quorum noc4,noc2,noc1,noc3,sysmon1 (age 10m)
mgr: noc1.jxxxx(active, since 37m), standbys: noc2.yhxxxxx,
noc3.xxxxb, noc4.txxxxc
mds: 1/1 daemons up, 3 standby
osd: 27 osds: 27 up (since 14m), 27 in (since 5w)
Ceph's actual core operations are otherwise normal.
It's hard to sell ceph as a concept when showing all the storage
is at once unreachable and up and in as well. Not a big
confidence builder.
Regards,
Nizam
On Mon, Feb 10, 2025 at 9:00 PM Harry G Coin <hgcoin@xxxxxxxxx> wrote:
In the same code area: If all the alerts are silenced,
nevertheless the
dashboard will not show 'green', but red or yellow depending on the
nature of the silenced alerts.
On 2/10/25 04:18, Nizamudeen A wrote:
> Thank you Chris,
>
> I was able to reproduce this. We will look into it and send out
a fix.
>
> Regards,
> Nizam
>
> On Fri, Feb 7, 2025 at 10:35 PM Chris Palmer
<chris.palmer@xxxxxxxxx> wrote:
>
>> Firstly thank you so much for the 19.2.1 release. Initial testing
>> suggests that the blockers that we had in 19.2.0 have all been
resolved,
>> so we are proceeding with further testing.
>>
>> We have noticed one small problem in 19.2.1 that was not present in
>> 19.2.0 though. We use the older-style dashboard
>> (mgr/dashboard/FEATURE_TOGGLE_DASHBOARD false). The problem
happens on
>> the Dashboard screen when health changes to WARN. If you click
on WARN
>> you get a small empty dropdown instead of the list of warnings. A
>> javascript error is logged, and using browser inspection there
is the
>> additional bit of info that it happens in polyfill:
>>
>> 2025-02-07T15:59:00.970+0000 7f1d63877640 0 [dashboard ERROR
>> frontend.error] (https://<redacted>:8443/#/dashboard): NG0901
>> Error: NG0901
>> at d.find (https://
>> <redacted>:8443/main.7869bccdd1b73f3c.js:3:3342365)
>> at le.ngDoCheck
>> (https://<redacted>:8443/main.7869bccdd1b73f3c.js:3:3173112)
>> at Qe
(https://<redacted>:8443/main.7869bccdd1b73f3c.js:3:3225586)
>> at bt
(https://<redacted>:8443/main.7869bccdd1b73f3c.js:3:3225341)
>> at cs
(https://<redacted>:8443/main.7869bccdd1b73f3c.js:3:3225051)
>> at $m
(https://<redacted>:8443/main.7869bccdd1b73f3c.js:3:3259043)
>> at jf
(https://<redacted>:8443/main.7869bccdd1b73f3c.js:3:3266563)
>> at S1
(https://<redacted>:8443/main.7869bccdd1b73f3c.js:3:3259790)
>> at $m
(https://<redacted>:8443/main.7869bccdd1b73f3c.js:3:3259801)
>> at fg
(https://<redacted>:8443/main.7869bccdd1b73f3c.js:3:3267248)
>>
>> Also, after this happens, no dropdowns work again until the page is
>> forcibly refreshed.
>>
>> Environment is RPM install on Centos 9 Stream.
>>
>> I've created issue [0].
>>
>> Thanks, Chris
>>
>> [0] https://tracker.ceph.com/issues/69867
>> <
>>
https://tracker.ceph.com/issues/69867?next_issue_id=69865&prev_issue_id=90
<https://tracker.ceph.com/issues/69867?next_issue_id=69865&prev_issue_id=90>
>> _______________________________________________
>> ceph-users mailing list -- ceph-users@xxxxxxx
>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>>
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx