On Wed, Mar 28, 2018 at 11:44 PM, Gregory Farnum <gfarnum@xxxxxxxxxx> wrote: > I had an amusing little problem today with a bug report about IO > pausing on a cluster when OSDs are killed. Naturally, the first thing > I wanted to do was see if it was the result of OSDs not getting marked > down, or if the PGs were not peering quickly after that. > > Only it turns out that in Luminous, we no longer log the pg states to > any single log I can find. ceph.log now contains only the health > summary; I wasn't provided the mgr log but it appears to require debug > 10 before printing out individual states. Let's change that to something like 4 instead of 10 so that it's at least easier to get at them directly on the daemon? > This means the only way to > get them is to have a high debug value while the logs are running (and > I don't think this is something people are used to on the manager > yet), and that any issues in the field will be difficult to resolve if > they aren't immediately reproducible. The purist answer is that the PG states are included in the prometheus output, which is a neater way of getting this kind of history of quantitative things. However, I'm not a purist, so... > So: I'm pretty sure we need to log PG state changes in more detail by > default. Does anybody have suggestions or preferences for *how* that > happens? My preference is for them to show up in ceph.log... ... we could reinstate the PGMap spam at debug level in its own channel in the cluster log, if we made LogMonitor keep separate summary buffers for each channel. Currently it has one global buffer, which means that any regular output (like the PGMap every 5 seconds) will blow away the recent history of any other type of log message -- that was the motivation for eliminating the PGMap message rather than just degrading it to debug. John > -Greg -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html