Hi, Jeff Hostetler wrote: > Signed-off-by: Jeff Hostetler <jeffhost@xxxxxxxxxxxxx> > --- > Documentation/technical/structured-logging.txt | 816 +++++++++++++++++++++++++ > 1 file changed, 816 insertions(+) > create mode 100644 Documentation/technical/structured-logging.txt Can you add this to Documentation/Makefile as well, so that an HTML version will appear once this is merged? See https://public-inbox.org/git/20180814222846.GG142615@xxxxxxxxxxxxxxxxxxxxxxxxx/ for an example. [...] > +++ b/Documentation/technical/structured-logging.txt > @@ -0,0 +1,816 @@ > +Structured Logging > +================== > + > +Structured Logging (SLOG) is an optional feature to allow Git to > +generate structured log data for executed commands. This includes > +command line arguments, command run times, error codes and messages, > +child process information, time spent in various critical functions, > +and repository data-shape information. Data is written to a target > +log file in JSON[1,2,3] format. I like the idea of more structured logs for tracing, monitoring, and diagnosis (see also https://research.google.com/archive/papers/dapper-2010-1.pdf on the subject of tracing). My main focus in looking over this initial contribution is 1. Is this something that could eventually merge with our other tracing APIs (e.g. trace_printf), or will they remain separate? 2. Is the API introduced here one that will be easy to work with and adapt? What will calling code that makes use of it look like? [...] > +Background (Git Merge 2018 Barcelona) > +===================================== > + > +Performance and/or error logging was discussed during the contributor's > +summit in Barcelona. Here are the relevant notes from the meeting > +minutes[6]. > + > +> Performance misc (Ævar) > +> ----------------------- > +> [...] > +> - central error reporting for git > +> - `git status` logging > +> - git config that collects data, pushes to known endpoint with `git push` > +> - pre_command and post_command hooks, for logs > +> - `gvfs diagnose` that looks at packfiles, etc [etc] I'm not sure what to make of this section. Is it a historical artifact, or should I consider it part of the design? [...] > +A Quick Example > +=============== > + > +Note: JSON pretty-printing is enabled in all of the examples shown in > +this document. When pretty-printing is turned off, each event is > +written on a single line. Pretty-printing is intended for debugging. > +It should be turned off in production to make post-processing easier. > + > + $ git config slog.pretty <bool> > + > +Here is a quick example showing SLOG data for "git status". This > +example has all optional features turned off. It contains 2 events. Thanks for the example! I'd also be interested in what the tracing code looks like, since the API is the 'deepest' change involved (the tracing format can easily change later based on experience). My initial reaction to the example trace is that it feels like an odd compromise: on one hand it uses text (JSON) to be human-readable, and on the other hand it is structured to be machine-readable. The JSON writer isn't using a standard JSON serialization library so parsing difficulties seem likely, and the output is noisy enough that reading it by hand is hard, too. That leads me to wish for a different format, like protobuf (where the binary form is very concise and parseable, and the textual form is IMHO more readable than JSON). All that said, this is just the first tracing output format and it's easy to make new ones later. It seems fine for that purpose. [...] > +Target Log File > +=============== > + > +SLOG writes events to a log file. File logging works much like > +GIT_TRACE where events are appended to a file on disk. Does it write with O_APPEND? Does it do anything to guard against interleaved events --- e.g. are messages buffered and then written with a single write() call? [...] > +Comparison with GIT_TRACE > +========================= > + > +SLOG is very similar to the existing GIT_TRACE[4] API because both > +write event messages at various points during a command. However, > +there are some fundamental differences that warrant it being > +considered a separate feature rather than just another > +GIT_TRACE_<key>: The natural question would be in the opposite direction: can and should trace_printf be reimplemented on top of this feature? [...] > + > + nit: trailing whitespace (you can find these with "git show --check"). [...] > +A session id (SID) is a cheap, unique-enough string to associate all > +of the events generated by a single process. A child git process inherits > +the SID of their parent git process and incorporates it into their SID. I wonder if we can structure events in a more hierarchical way to avoid having to special-case the toplevel command in this way. [...] > + # The "cmd_exit" event includes the command result and elapsed > + # time and the various configuration settings. During the run > + # "index" category timers were placed around the do_read_index() > + # and "preload()" calls and various "status" category timers were > + # placed around the 3 major parts of the status computation. > + # Lastly, an "index" category "aux" data item was added to report > + # the size of the index. Especially for tracing and monitoring applications, it would be helpful if more of the information were emitted earlier instead of waiting for exit. Especially if I am trying to trace the cause of a hanging command, information that is only printed at exit does not help me. The main missing bit in this doc is a sketch of the API. Looking forward to finding that in the header. ;-) Sorry for the slow review, and thanks for your thoughtful work. Sincerely, Jonathan