Re: monitoring apply_latency / commit_latency ?

Matthias Ferdinand <mf+ml.ceph@xxxxxxxxx> · Sat, 25 Mar 2023 20:15:45 +0100

On Sat, Mar 25, 2023 at 11:09:58AM +0700, Konstantin Shalygin wrote:
> Hi Matthias,
> 
> Prometheus exporter already have all this metrics, you can setup Grafana panels as you want
> Also, the apply latency in a metric for a pre-bluestore, i.e. filestore
> For Bluestore apply latency is the same as commit latency, you can check this via `ceph osd perf` command

Thanks Konstantin,

do I guess right that the metrics shown in your screenshot correspond to
values

  "bluestore.txc_commit_lat.description": "Average commit latency",
  "bluestore.txc_throttle_lat.description": "Average submit throttle latency",
  "bluestore.txc_submit_lat.description": "Average submit latency",
  "bluestore.read_lat.description": "Average read latency",

from "ceph daemon osd.X perf dump"?

And "ceph osd perf" output would correspond to
  "bluestore.txc_commit_lat.description": "Average commit latency",
or
  "filestore.apply_latency.description": "Apply latency",
  "filestore.journal_latency.description": "Average journal queue completing latency",
depending on OSD format?

It looks like "read_lat" is Bluestore only, and there is no comparable
value for Filestore.

There are other, format-agnostic OSD latency values:
  "osd.op_r_latency.description": "Latency of read operation (including queue time)",
  "osd.op_w_latency.description": "Latency of write operation (including queue time)",
  "osd.op_rw_latency.description": "Latency of read-modify-write operation (including queue time)",

More guesswork:
  - is osd.op_X_latency about client->OSD command timing?
  - are bluestore/filestore values about OSD->storage op timing?

Please bear with me :-) I just try to get some rough understanding what
the numbers to be collected and graphed actually mean and how they are
related to each other.

Regards
Matthias

> > On 25 Mar 2023, at 00:02, Matthias Ferdinand <mf+ml.ceph@xxxxxxxxx> wrote:
> > 
> > Hi,
> > 
> > I would like to understand how the per-OSD data from "ceph osd perf"
> > (i.e.  apply_latency, commit_latency) is generated. So far I couldn't
> > find documentation on this. "ceph osd perf" output is nice for a quick
> > glimpse, but is not very well suited for graphing. Output values are
> > from the most recent 5s-averages apparently.
> > 
> > With "ceph daemon osd.X perf dump" OTOH, you get quite a lot of latency
> > metrics, while it is just not obvious to me how they aggregate into
> > apply_latency and commit_latency. Or some comparably easy read latency
> > metric (something that is missing completely in "ceph osd perf").
> > 
> > Can somebody shed some light on this?
> > 
> > 
> > Regards
> > Matthias
> > _______________________________________________
> > ceph-users mailing list -- ceph-users@xxxxxxx
> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx