Re: Proposition - latency histogram

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Nov 28, 2016 at 4:51 PM, Sage Weil <sage@xxxxxxxxxxxx> wrote:
> On Mon, 28 Nov 2016, Bartłomiej Święcki wrote:
>> Hi,
>>
>> Currently we can query OSD for op latency but it's given as an average.
>> Average may not give the bets information in this case - i.e. spikes can
>> easily get hidden there.
>>
>> Instead of an average we could easily do a simple histogram - quantize
>> the latency into predefined set of time intervals, for each of them have
>> a simple performance counter, at each op increase one of them. Since
>> those are per OSD, we could have pretty high resolution with fractional
>> memory usage, performance impact should be negligible since only one
>> (two if split into read and write) of those counters would be
>> incremented per one osd op.
>>
>> In addition we could also do this in 2D - each counter matching given
>> latency range and op size range. having such 2D table would show both
>> latency histogram, request size histogram and combinations of those
>> (i.e. latency histogram of ~4k ops only).
>>
>> What do you think about this idea? I can prepare some code - a simple proof of
>> concept looks really
>> straightforward to implement.
>
> This sounds like a great idea.  I think the main issue is that the data
> won't be easily exposed via the perfcounter interface... at least not in a
> way that generic tools can visualize.  Unless there is a standardish way
> to report histogram metrics?

Newer tools are waking up to the need for histograms, e.g. Prometheus
has a histogram datatype:
https://prometheus.io/docs/concepts/metric_types/#histogram

Someone has done some work on adding support in grafana:
https://github.com/grafana/grafana/issues/600

Should be reasonably straightforward to add a histogram type to the
perf counters: people might end up flattening it to a series of scalar
time series with _bucket suffixes or whatever, but I'd definitely be
in favour of us adding an explicit histogram type internally.

John
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux