Yes, all the errors and warnings list as 'suppressed'. Doesn't affect
the bug as reported below.
Of some interest, "OSD_UNREACHABLE" is not listed on the dashboard alert
roster of problems, but is in the command line health detail.
But really, when all the errors list as 'suppressed', whatever they are,
then the dashboard should show green. Instead it flashes red, along
with !Critical as detailed below.
I suspect what's really going on is the detection method for showing the
'red / yellow / green' decision and !Critical decision, is different
than whether the length of unsilenced errors is >0. Even allowing for
the possibility that many errors exist which could trigger HEALTH_ERR
for which no entry in the roster of alerts exists.
I wish I knew whether mon/mgr host changes erases all the mutings. I
have just now given the command "ceph health mute OSD_UNREACHABLE 180d'
-- for the second time this week, and now the board shows green. I
scrolled back through the command list to very I did this. Indeed it
was there. Is there a command line that lists active mutings -- one
that's not used by the dashboard apparently?
On 2/10/25 14:00, Eugen Block wrote:
Hi,
did you also mute the osd_unreachable warning?
ceph health mute OSD_UNREACHABLE 10w
Should bring the cluster back to HEALTH_OK for 10 weeks.
Zitat von Harry G Coin <hgcoin@xxxxxxxxx>:
Hi Nizam
Answers interposed below.
On 2/10/25 11:56, Nizamudeen A wrote:
Hey Harry,
Do you see that for every alert or for some of them? If some, what
are those? I just tried a couple of them locally and saw the
dashboard went to a happy state.
My sanbox/dev array has three chronic 'warnings/errors'. The first
is a PG imbalance I'm aware of. The second is that all 27 osds are
unreachable. The third is that the array has been in an error state
for more than 5 minutes. Silencing/suppressing all of them still
gives the 'red flashing broken dot' on the dashboard, the !Cluster
status, notice of Alerts listing the previously suppressed
errors/warnings. Under 'observability' we see no indications of
errors/warnings under the 'alerts' menu option -- so you got that one
right.
Can you tell me how the ceph health or ceph health detail looks like
after the muted alert? And also does ceph -s reports HEALTH_OK?
root@noc1:~# ceph -s
cluster:
id: 40671....140f8
health: HEALTH_ERR
27 osds(s) are not reachable
services:
mon: 5 daemons, quorum noc4,noc2,noc1,noc3,sysmon1 (age 10m)
mgr: noc1.jxxxx(active, since 37m), standbys: noc2.yhxxxxx,
noc3.xxxxb, noc4.txxxxc
mds: 1/1 daemons up, 3 standby
osd: 27 osds: 27 up (since 14m), 27 in (since 5w)
Ceph's actual core operations are otherwise normal.
It's hard to sell ceph as a concept when showing all the storage is
at once unreachable and up and in as well. Not a big confidence
builder.
Regards,
Nizam
On Mon, Feb 10, 2025 at 9:00 PM Harry G Coin <hgcoin@xxxxxxxxx> wrote:
In the same code area: If all the alerts are silenced,
nevertheless the
dashboard will not show 'green', but red or yellow depending on the
nature of the silenced alerts.
On 2/10/25 04:18, Nizamudeen A wrote:
> Thank you Chris,
>
> I was able to reproduce this. We will look into it and send out
a fix.
>
> Regards,
> Nizam
>
> On Fri, Feb 7, 2025 at 10:35 PM Chris Palmer
<chris.palmer@xxxxxxxxx> wrote:
>
>> Firstly thank you so much for the 19.2.1 release. Initial testing
>> suggests that the blockers that we had in 19.2.0 have all been
resolved,
>> so we are proceeding with further testing.
>>
>> We have noticed one small problem in 19.2.1 that was not
present in
>> 19.2.0 though. We use the older-style dashboard
>> (mgr/dashboard/FEATURE_TOGGLE_DASHBOARD false). The problem
happens on
>> the Dashboard screen when health changes to WARN. If you click
on WARN
>> you get a small empty dropdown instead of the list of warnings. A
>> javascript error is logged, and using browser inspection there
is the
>> additional bit of info that it happens in polyfill:
>>
>> 2025-02-07T15:59:00.970+0000 7f1d63877640 0 [dashboard ERROR
>> frontend.error] (https://<redacted>:8443/#/dashboard): NG0901
>> Error: NG0901
>> at d.find (https://
>> <redacted>:8443/main.7869bccdd1b73f3c.js:3:3342365)
>> at le.ngDoCheck
>> (https://<redacted>:8443/main.7869bccdd1b73f3c.js:3:3173112)
>> at Qe
(https://<redacted>:8443/main.7869bccdd1b73f3c.js:3:3225586)
>> at bt
(https://<redacted>:8443/main.7869bccdd1b73f3c.js:3:3225341)
>> at cs
(https://<redacted>:8443/main.7869bccdd1b73f3c.js:3:3225051)
>> at $m
(https://<redacted>:8443/main.7869bccdd1b73f3c.js:3:3259043)
>> at jf
(https://<redacted>:8443/main.7869bccdd1b73f3c.js:3:3266563)
>> at S1
(https://<redacted>:8443/main.7869bccdd1b73f3c.js:3:3259790)
>> at $m
(https://<redacted>:8443/main.7869bccdd1b73f3c.js:3:3259801)
>> at fg
(https://<redacted>:8443/main.7869bccdd1b73f3c.js:3:3267248)
>>
>> Also, after this happens, no dropdowns work again until the
page is
>> forcibly refreshed.
>>
>> Environment is RPM install on Centos 9 Stream.
>>
>> I've created issue [0].
>>
>> Thanks, Chris
>>
>> [0] https://tracker.ceph.com/issues/69867
>> <
>>
https://tracker.ceph.com/issues/69867?next_issue_id=69865&prev_issue_id=90
<https://tracker.ceph.com/issues/69867?next_issue_id=69865&prev_issue_id=90>
>> _______________________________________________
>> ceph-users mailing list -- ceph-users@xxxxxxx
>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>>
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx