On Thu, 13 Jun 2019, Neha Ojha wrote: > Hi everyone, > > There has been some interest in a feature that helps users to mute > health warnings. There is a trello card[1] associated with it and > we've had some discussion[2] in the past in a CDM about it. In > general, we want to understand a few things: > > 1. what is the level of interest in this feature > 2. for how long should we mute these warnings - should the period be > decided by us or the user > 3. possible misuse of this feature and negative impacts of muting some warnings > > Let us know what you think. > > [1] https://trello.com/c/vINMkfTf/358-mute-health-warnings > [2] https://pad.ceph.com/p/cephalocon-usability-brainstorming What if we start with something like: - a 'mute' targets a specific warning code (e.g., OSD_DOWN) e.g., 'ceph health mute OSD_DOWN' - the mute matches the alert code and the short description (e.g., "2 osds down") - this could be more specific, like matching the detail items too - or, it could be less specific, so that e.g., a OSD_DOWN going from 2 to 1 osd won't unmute - or, individual detail items could be the things that get muted -> we might need to make alerts include more structured fields (besides a summary string and vector<string> of details) in order to make this work perfectly... but we can start start simple (with just the summary string match?). - the mute goes away if - the description changes - the alert resolves - the TTL/expiration time is reached - the user unmutes (the specific mute 'ceph health unmute <code>' or all mutes with 'ceph health umute') - 'ceph -s' will say HEALTH_OK (if all alerts are muted), but *also* say how many muted alerts there are, e.g. cluster: id: 28f7427e-5558-4ffd-ae1a-51ec3042759a health: HEALTH_OK 2 muted alerts: OSD_DOWN, TOO_MANY_PGS services: ... - 'ceph health' will say HEALTH_OK (if all alerts are muted) - 'ceph health detail' will say HEALTH_OK (if all alerts are muted), but will *also* show all of the muted alerts in a separate section (along with the mute TTL/expiration) - the dashboard would show HEALTH_OK, plus some clear visual indication that there are one or more mutes, with an easy UI to mute/unmute sage _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com