Re: [PATCH v1 01/25] structured-logging: design document

Jonathan Nieder <jrnieder@xxxxxxxxx> · Mon, 20 Aug 2018 21:47:24 -0700

Hi,

Jeff Hostetler wrote:

> Signed-off-by: Jeff Hostetler <jeffhost@xxxxxxxxxxxxx>
> ---
>  Documentation/technical/structured-logging.txt | 816 +++++++++++++++++++++++++
>  1 file changed, 816 insertions(+)
>  create mode 100644 Documentation/technical/structured-logging.txt

Can you add this to Documentation/Makefile as well, so that an HTML
version will appear once this is merged?  See
https://public-inbox.org/git/20180814222846.GG142615@xxxxxxxxxxxxxxxxxxxxxxxxx/
for an example.

[...]
> +++ b/Documentation/technical/structured-logging.txt
> @@ -0,0 +1,816 @@
> +Structured Logging
> +==================
> +
> +Structured Logging (SLOG) is an optional feature to allow Git to
> +generate structured log data for executed commands.  This includes
> +command line arguments, command run times, error codes and messages,
> +child process information, time spent in various critical functions,
> +and repository data-shape information.  Data is written to a target
> +log file in JSON[1,2,3] format.

I like the idea of more structured logs for tracing, monitoring, and
diagnosis (see also
https://research.google.com/archive/papers/dapper-2010-1.pdf on the
subject of tracing).  My main focus in looking over this initial
contribution is

 1. Is this something that could eventually merge with our other
    tracing APIs (e.g. trace_printf), or will they remain separate?

 2. Is the API introduced here one that will be easy to work with
    and adapt?  What will calling code that makes use of it look like?

[...]
> +Background (Git Merge 2018 Barcelona)
> +=====================================
> +
> +Performance and/or error logging was discussed during the contributor's
> +summit in Barcelona.  Here are the relevant notes from the meeting
> +minutes[6].
> +
> +> Performance misc (Ævar)
> +> -----------------------
> +> [...]
> +>  - central error reporting for git
> +>    - `git status` logging
> +>    - git config that collects data, pushes to known endpoint with `git push`
> +>    - pre_command and post_command hooks, for logs
> +>    - `gvfs diagnose` that looks at packfiles, etc
[etc]

I'm not sure what to make of this section.  Is it a historical
artifact, or should I consider it part of the design?

[...]
> +A Quick Example
> +===============
> +
> +Note: JSON pretty-printing is enabled in all of the examples shown in
> +this document.  When pretty-printing is turned off, each event is
> +written on a single line.  Pretty-printing is intended for debugging.
> +It should be turned off in production to make post-processing easier.
> +
> +    $ git config slog.pretty <bool>
> +
> +Here is a quick example showing SLOG data for "git status".  This
> +example has all optional features turned off.  It contains 2 events.

Thanks for the example!  I'd also be interested in what the tracing
code looks like, since the API is the 'deepest' change involved (the
tracing format can easily change later based on experience).

My initial reaction to the example trace is that it feels like an odd
compromise: on one hand it uses text (JSON) to be human-readable, and
on the other hand it is structured to be machine-readable.  The JSON
writer isn't using a standard JSON serialization library so parsing
difficulties seem likely, and the output is noisy enough that reading
it by hand is hard, too.  That leads me to wish for a different
format, like protobuf (where the binary form is very concise and
parseable, and the textual form is IMHO more readable than JSON).

All that said, this is just the first tracing output format and it's
easy to make new ones later.  It seems fine for that purpose.

[...]
> +Target Log File
> +===============
> +
> +SLOG writes events to a log file.  File logging works much like
> +GIT_TRACE where events are appended to a file on disk.

Does it write with O_APPEND?  Does it do anything to guard against
interleaved events --- e.g. are messages buffered and then written
with a single write() call?

[...]
> +Comparison with GIT_TRACE
> +=========================
> +
> +SLOG is very similar to the existing GIT_TRACE[4] API because both
> +write event messages at various points during a command.  However,
> +there are some fundamental differences that warrant it being
> +considered a separate feature rather than just another
> +GIT_TRACE_<key>:

The natural question would be in the opposite direction: can and
should trace_printf be reimplemented on top of this feature?

[...]
> +    
> +    

nit: trailing whitespace (you can find these with "git show --check").

[...]
> +A session id (SID) is a cheap, unique-enough string to associate all
> +of the events generated by a single process.  A child git process inherits
> +the SID of their parent git process and incorporates it into their SID.

I wonder if we can structure events in a more hierarchical way to
avoid having to special-case the toplevel command in this way.

[...]
> +    # The "cmd_exit" event includes the command result and elapsed
> +    # time and the various configuration settings.  During the run
> +    # "index" category timers were placed around the do_read_index()
> +    # and "preload()" calls and various "status" category timers were
> +    # placed around the 3 major parts of the status computation.
> +    # Lastly, an "index" category "aux" data item was added to report
> +    # the size of the index.

Especially for tracing and monitoring applications, it would be
helpful if more of the information were emitted earlier instead of
waiting for exit.  Especially if I am trying to trace the cause of a
hanging command, information that is only printed at exit does not
help me.

The main missing bit in this doc is a sketch of the API.  Looking
forward to finding that in the header. ;-)

Sorry for the slow review, and thanks for your thoughtful work.

Sincerely,
Jonathan