First implementation of logging with LTTng

Mohamad Gebai <mgebai@xxxxxxx> · Thu, 7 Jun 2018 12:22:50 -0400

Hi all,

I've opened PR #22458 [1] as a first implementation of Ceph using LTTng
for logging. I've only converted Bluestore.cc for now, and I've used fio
with the ojbectstore backend to benchmark it. Early results are somewhat
promising:

- With dout and debug bluestore = 10/0:
    Average throughput = 119 MiB/s
    Average IOPS = 13.68
    Disk usage: 48M

- With LTTng:
    Average throughput = 125 MiB/s
    Average IOPS = 14.09
    Disk usage: 23M

The throughput difference varies with the number of fio jobs and queue
depth. I'll add more benchmarking numbers to the PR as I run them.

This is an early implementation and there's plenty of room for
improvement, but here are the highlights: I've taken the approach from
the QEMU project, which uses a script (tracetool.py) to generate the
instrumentation code at configure-time. More detail can be found in the
README of the PR.

All the trace session management still needs to be added somewhere
(maybe in service files). For now, you'll need to start/stop tracing
manually. On another note, there's a new feature with the upcoming 2.10
version of LTTng that will greatly benefit us if we decided to go with
LTTng: session rotation. Session rotation is similar to what we have
with dout and log files: periodically sealing the trace file and move on
to a new one. This means that the sealed traces can be compressed and
sent to someone for analysis, which we can't do with the current version
of LTTng. Session rotation can be size-based or time-based.

The major improvements we get with LTTng is that it uses per-cpu buffers
to store the traces. It produces a compact binary format (CTF) which
means we can save significantly in terms of disk usage. A deactivated
tracepoint is basically a nop, so there is not much of a cost for having
disabled tracepoints (both in time and instruction cache friendliness).

As a next step, I was thinking of porting more dout()s to tracepoints,
and running Ceph (instead of fio's objectstore) on a real cluster and
look for any improvement/setback.

Any thought about the approach? How can I move this forward?

Mohamad

[1] https://github.com/ceph/ceph/pull/22458

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html