Re: debugging pg states

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 29 Mar 2018, John Spray wrote:
> On Wed, Mar 28, 2018 at 11:44 PM, Gregory Farnum <gfarnum@xxxxxxxxxx> wrote:
> > I had an amusing little problem today with a bug report about IO
> > pausing on a cluster when OSDs are killed. Naturally, the first thing
> > I wanted to do was see if it was the result of OSDs not getting marked
> > down, or if the PGs were not peering quickly after that.
> >
> > Only it turns out that in Luminous, we no longer log the pg states to
> > any single log I can find. ceph.log now contains only the health
> > summary; I wasn't provided the mgr log but it appears to require debug
> > 10 before printing out individual states.
> 
> Let's change that to something like 4 instead of 10 so that it's at
> least easier to get at them directly on the daemon?
> 
> > This means the only way to
> > get them is to have a high debug value while the logs are running (and
> > I don't think this is something people are used to on the manager
> > yet), and that any issues in the field will be difficult to resolve if
> > they aren't immediately reproducible.
> 
> The purist answer is that the PG states are included in the prometheus
> output, which is a neater way of getting this kind of history of
> quantitative things.  However, I'm not a purist, so...
> 
> > So: I'm pretty sure we need to log PG state changes in more detail by
> > default. Does anybody have suggestions or preferences for *how* that
> > happens? My preference is for them to show up in ceph.log...
> 
> ... we could reinstate the PGMap spam at debug level in its own
> channel in the cluster log, if we made LogMonitor keep separate
> summary buffers for each channel.  Currently it has one global buffer,
> which means that any regular output (like the PGMap every 5 seconds)
> will blow away the recent history of any other type of log message --
> that was the motivation for eliminating the PGMap message rather than
> just degrading it to debug.

This is on Joao's todo list.. Joao, do you have any estimate?  Or are 
there other takers?

sage
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux