Re: First implementation of logging with LTTng

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 7 Jun 2018, Mohamad Gebai wrote:
> Hi all,
> 
> I've opened PR #22458 [1] as a first implementation of Ceph using LTTng
> for logging. I've only converted Bluestore.cc for now, and I've used fio
> with the ojbectstore backend to benchmark it. Early results are somewhat
> promising:
> 
> - With dout and debug bluestore = 10/0:
>     Average throughput = 119 MiB/s
>     Average IOPS = 13.68
>     Disk usage: 48M
> 
> - With LTTng:
>     Average throughput = 125 MiB/s
>     Average IOPS = 14.09
>     Disk usage: 23M
> 
> The throughput difference varies with the number of fio jobs and queue
> depth. I'll add more benchmarking numbers to the PR as I run them.
> 
> This is an early implementation and there's plenty of room for
> improvement, but here are the highlights: I've taken the approach from
> the QEMU project, which uses a script (tracetool.py) to generate the
> instrumentation code at configure-time. More detail can be found in the
> README of the PR.
> 
> All the trace session management still needs to be added somewhere
> (maybe in service files). For now, you'll need to start/stop tracing
> manually. On another note, there's a new feature with the upcoming 2.10
> version of LTTng that will greatly benefit us if we decided to go with
> LTTng: session rotation. Session rotation is similar to what we have
> with dout and log files: periodically sealing the trace file and move on
> to a new one. This means that the sealed traces can be compressed and
> sent to someone for analysis, which we can't do with the current version
> of LTTng. Session rotation can be size-based or time-based.
> 
> The major improvements we get with LTTng is that it uses per-cpu buffers
> to store the traces. It produces a compact binary format (CTF) which
> means we can save significantly in terms of disk usage. A deactivated
> tracepoint is basically a nop, so there is not much of a cost for having
> disabled tracepoints (both in time and instruction cache friendliness).
> 
> As a next step, I was thinking of porting more dout()s to tracepoints,
> and running Ceph (instead of fio's objectstore) on a real cluster and
> look for any improvement/setback.
> 
> Any thought about the approach? How can I move this forward?

This is a huge improvement over manually writing the .tp files!  I have 
one thought, though.  It's still necessary both to adjust the tracepoint 
location, e.g.,

 
 -      dout(30) << __func__ << " " << *(b_it->first)
 -               << " expected4release=" << blob_expected_for_release
 -               << " expected_allocations=" << bi.expected_allocations
 -               << dendl;
 +      trace_process_protrusive_extents_expected4release(*(b_it->first),
 +       blob_expected_for_release, bi.expected_allocations);

and also to include the line in tracing/tracetool/subsys, eg.,

 + process_protrusive_extents_expected4release(Blob blob, int64_t blob_expected_for_release, int64_t expected_allocations) "%s expected4release=%llu expected_allocations=%llu" 30

Do you think it's possible to declare these inline and generate the 
tracing/tracetool/subsys files from the source?  For example, write 
something like

        trace(30, process_protrusive_extents_expected4release,
		Blob *(b_it->first),
		int64_t blob_expected_for_release,
		int64_t bi.expected_allocations,
		"%s expected4release=%llu expected_allocations=%llu");

A macro could expand that out to the function call, and a preprocessor 
pass could slurp these up and generate the file for input to 
tracetool (or tracetool could extract them from the source 
directly).

It's still a bit more awkward than dout was in that (1) you have to name 
the tracepoint and (2) you have to specify the types of the arguments.  
That might be a good thing (blessing in disguise) to make you think more 
carefully about the tracepoints, though.  OTOH having to edit two files 
and match up arguments and types seems needlessly tedious, though?

sage

[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux