Re: logging in seastar-osd

Mohamad Gebai <mgebai@xxxxxxx> · Wed, 25 Apr 2018 10:05:07 -0400

On 04/25/2018 09:21 AM, Sage Weil wrote:
> On Wed, 25 Apr 2018, Mohamad Gebai wrote:
>> On 04/25/2018 04:46 AM, kefu chai wrote:
>>> On Wed, Apr 25, 2018 at 12:59 AM, Adam C. Emerson <aemerson@xxxxxxxxxx> wrote:
>>>> On 25/04/2018, kefu chai wrote:
>>>> [snip]
>>>> I have thought off and on and chatted with a few others about the
>>>> using a binary log, since the runtime cost of all that stringification
>>>> at high log levels is Not Insignificant.
>>> yeah, i recall the discussion in a performance meeting the other day.
>>> so are we going to have a dictionary for each log entry? and for each
>>> log entry, it will contain <index, variable length blob>. we will ship a tool
>>> which embeds a dictionary, in which we can lookup a log entry by its
>>> index to find out <a fmt string, a list of indices into another dictionary>.
>>> in "another" dictionary, it contains the recipes for printing various objects
>>> in Ceph.
>> I'm in favor of this approach. I had done some (quick) analysis and from
>> what I had seen the string copying was the culprit in the logging
>> mechanism, rather than the locking. I replaced the dout() mechanism to
>> use LTTng tracepoints instead of the in-Ceph logging (bypassing the
>> locking and the linked list of log entries) and there wasn't much
>> improvement. But again, this was on a tiny cluster, so more testing
>> would need to be done to really come to that conclusion.
>> I had started using libbabeltrace to write that binary format, which can
>> then be read using babeltrace(1), but from what I've seen, libbabeltrace
>> isn't really fit for fast logging at run-time. We can rewrite this part
>> though, as Kefu is suggesting.
>>
>> The good thing about using our own formats is that we wouldn't need to
>> have a single plain text .log file with all log entries. This will allow
>> us to write per-CPU or per-thread binary log files, and merge them in
>> the binary tool (like many tracers do). It's not a trivial task though.
>> If the gain is too little, we might as well keep the current dout()
>> infrastructure and improve parts of it.
> I'm unclear as to what the benefit is in writing our own binary log format 
> instead of just using a existing tracing tool.  Aren't the tracing 
> tools going to be more sophisticated, better optimized, and lower 
> overhead (and already written)?  I would expect LTTng to be very fast...

It is very fast, but the downside would be usability. To move all logs
to LTTng tracepoints means that someone needs to create a tracing
session, make sure the lttng-session daemon is running, enable the right
events, start the tracer (this can be done in Ceph using the LTTng's C
API), think about what happens when a Ceph process crashes, what happens
when LTTng crashes, what happens if someone on the system inadvertently
stops the tracing system-wide, etc. Some of these issues can be
overcome, though. In summary: it's not the default use case of LTTng,
but there is a way if we wanted to go down that path.
I've heard of a suggestion that we could also use a mix of both: dout()
for lower log_levels and LTTng for for higher ones.

Mohamad

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html