Re: towards a user-mode diagnostic log mechanism

Colin McCabe <cmccabe@xxxxxxxxxxxxxx> · Thu, 5 Jan 2012 20:09:26 -0800

On Mon, Dec 19, 2011 at 6:42 PM, Mark Kampe <mark.kampe@xxxxxxxxxxxxx> wrote:
> I'd like to keep this ball moving ... as I believe that the
> limitations of our current logging mechanisms are already
> making support difficult, and that is about to become worse.
>
> As a first step, I'd just like to get opinions on the general
> requirements we are trying to satisfy, and decisions we have
> to make along the way.
>
> Comments?
>
> I Requirements
>
>  A. Primary Requirements (must have)
>     1. information captured
>        a. standard: time, sub-system, level, proc/thread

We already do capture all this stuff, just not efficiently.

One easy way to get a lot of efficiency would be to add a thread-local
variable to store the thread name / ID.

Getting the system time is a surprisingly expensive operation, and
this poses a problem for logging system designers.  You can use the
rdtsc CPU instruction, but unfortunately on some architectures CPU
frequency scaling makes it very inaccurate.  Even when it is accurate,
it's not synchronized between multiple CPUs.

Another option is to without time for most messages and just have a
periodic timestamp that gets injected every so often-- perhaps every
second, for example.

>        b. additional: operation and parameters
>        c. extensible for new operations
>     2. efficiency
>        a. run time overhead < 1%
>           (I believe this requires delayed flush circular bufferring)

The current code could be changed to support delayed flush without too
much trouble.  In order to do that:

1. The message timestamp needs to be added by dout() rather than by
DoutStreambuf::overflow()

2. dendl needs to stop flushing the stream.  This part is simple...
just change the dendl macro to add "\n" rather than std::endl.

3. A signal handler needs to be added that flushes the dout stream
before program exit.

#1 is probably the most complex part.

I never added support for delayed flush because I was afraid that it
would prevent us from debugging bugs which killed processes without
triggering the signal handlers.  For example, when a process is killed
by the OOM killer, it does not get a chance to clean up, and so the
logs will be incomplete unless they are being continuously flushed.

Perhaps Ceph has matured to the point where this makes sense to add,
at least as an option.

>        b. persistent space O(Gigabytes per node-year)
>     3. configurability
>        a. capture level per sub-system
>     4. persistence
>        a. flushed out on process shut-down
>        b. recoverable from user-mode core-dumps
>     5. presentation
>        a. output can be processed w/grep,less,...
>
>  B. Secondary Requirements (nice to have)
>     1. ease of use
>        a. compatible with/convertable from existing calls
>        b. run-time definition of new event records
>     2. configurability
>        a. size/rotation rules per sub-system
>        b. separate in-memory/on-disk capture levels
>
> II Decisions to be made
>
>   A. Capture Circumstances
>      1. some subset of procedure calls
>         (I'm opposed to this, but it is an option)

Logging all function calls is very slow.  Trust me on this, I wrote a
profiler that did this once...  I used -finstrument-functions.

>      2. explicit event logging calls
>
>   B. Capture Format
>      1. ASCII text
>      2. per-event binary format
>      3. binary header + ASCII text
>
>   C. Synchronization
>      1. per-process vs per-thread buffers
>
>   D. Flushing
>      1. last writer flushes vs dedicated thread
>      2. single- vs double-bufferred output
>
>   E. Available open source candidates

Pantheios and log4cpp are two potential candidates.  I don't know that
much about either, unfortunately.

Honestly, I feel like logging is something that you will end up
customizing pretty extensively to suit the needs of your application.
But perhaps it's worth checking out what these libraries provide--
especially in terms of efficiency.

The biggest efficiency killer in logging right now is that dout lock.
However, even if that lock was replaced by an atomic operation, there
would still be the issue of cacheline ping-pong to contend with.  In
order to scale to the hundred-core systems of the future, you have to
get rid of that shared state.

P.S. Happy new year.

cheers,
Colin

>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html