On Fri, Mar 15 2019, Jeff Hostetler wrote: > On 3/13/2019 7:49 PM, Ævar Arnfjörð Bjarmason wrote: >> >> On Thu, Mar 14 2019, Josh Steadmon wrote: >> >>> When the value of a trace2 environment variable contains instances of >>> the string "%ISO8601%", expand them into the current UTC timestamp in >>> ISO 8601 format. >> >> Any reason not to just support feeding the path to strbuf_addftime(), to >> e.g. support a daily/hourly log? >> >>> When the value of a trace2 environment variable is an absolute path >>> referring to an existing directory, write output to randomly-named >>> files under the given directory. If the value is an absolute path >>> referring to a non-existent file and ends with a dash, use the value as >>> a prefix for randomly named files. >>> >>> The random filenames will consist of the value of the environment >>> variable (after potential timestamp expansion), followed by a 6 >>> character random string such as would be produced by mkstemp(3). >>> >>> This makes it more convenient to collect traces for every git >>> invocation by unconditionally setting the relevant trace2 envvar to a >>> constant directory name. >> >> Hrm, api-trace2.txt already specifies that the "sid" is going to be >> unique, couldn't we just have some mode where we use that? >> >> But then of course when we have nested processes will contain slashes, >> so we'd either run into deep nesting or need to munge the slashes, in >> which case we might bump against a file length limit (although I haven't >> seen process trees deeper than 3-4). > > Using the "sid" would be a good place to start. Just take the final > component in the string (after the last slash or the whole sid if there > are no slashes). That will give you a filename with microseconds since > epoch of the command's start time and the PID. > > That should be unique, should not require random strings, and not go > deep in the filesystem. And it will let you correlate files between > child and parent commands, if you need to. > > So maybe if GIT_TR2_* is set to a directory, we append the final portion > of the "sid" and create a file inside that directory. > >> >> Just to pry about the use-case since I'm doing similar collecting, why >> are you finding this easier to process? >> >> With the current O_APPEND semantics you're (unless I've missed >> something) guaranteed to get a single process tree in nested order, >> whereas with this they'll all end up in separate files and you'll need >> to slurp them up, sort the whole thing and stitch it together yourself >> without the benefit of stream-parsing it where you can cheat a bit >> knowing that e.g. a "reflog expire" entry is always coming after the >> corresponding "gc" that invoked it. >> > > Yes, with O_APPEND, you should get a series of events as they happen > on the system all properly interleaved. And see concurrent activity. > This file should let you grep to see individual processes if you want > to. > > Routing each command to a different file is fine if you want, but > that opens you up to having to manage and delete them. > > Whether to have 1 file (with occasional rotation) or 1 file-per-command > depends, I guess, on how you want to process them. > > I'm routing the Trace2 data to a named-pipe/socket and have a daemon > collecting and filtering, so I have a single pathname for output and > yet get the per-file stream handling that I think Josh is looking for. Is the collecting code something you can share & general enough that it might be useful for others?