Re: [PATCH v3 1/1] trace2: write to directory targets

Jeff Hostetler <git@xxxxxxxxxxxxxxxxx> · Mon, 25 Mar 2019 12:29:32 -0400

On 3/23/2019 4:44 PM, Ævar Arnfjörð Bjarmason wrote:

On Thu, Mar 21 2019, Josh Steadmon wrote:

When the value of a trace2 environment variable is an absolute path
referring to an existing directory, write output to files (one per
process) underneath the given directory. Files will be named according
to the final component of the trace2 SID, followed by a counter to avoid
potential collisions.

[...]

The reason I'm raising this is that it seems like sweeping an existing
issue under the rug. We document that the "sid" is "unique", and it's just:

     <nanotime / 1000 (i.e. *nix time in microseconds)>-<pid>

So that might be a lie, and in particular I can imagine that say if
every machine at Google is logging traces into some magic mounted FS
that there'll be collisions there.

But then let's *fix that*, because we're also e.g. going to have other
consumers of these traces using the sid's as primary keys in a logging
system.

I wonder if we should just make it a bit longer, human-readable, and
include a hash of the hostname:

     perl -MTime::HiRes=gettimeofday -MSys::Hostname -MDigest::SHA=sha1_hex -MPOSIX=strftime -wE '
         my ($t, $m) = gettimeofday;
         my $host_hex = substr sha1_hex(hostname()), 0, 8;
         my $htime = strftime("%Y%m%d%H%M%S", localtime);
         my $sid = sprintf("%s-%6d-%s-%s",
             $htime,
             $m,
             $host_hex,
             $$ & 0xFFFF,
         );
         say $sid;
     '

Which gets you a SID like:

     20190323213918-404788-c2f5b994-19027

I.e.:

     <YYYYMMDDHHMMSS>-<microsecond-offset>-<8 chars of sha1(hostname -f)>-<pid>

There's obviously ways to make that more compact, but in this case I
couldn't see a reason to, also using UTC would be a good idea.

All the trace2 tests pass if I fake that up. Jeff H: Do you have
anything that relies on the current format?
I'm using the SID hierarchy to track parent and child processes,
but the actual format of an individual SID-component is mostly a
black box.

I used the microseconds+pid as unique enough.  And events for new
commands will mostly just append to an existing index, rather than
being a random insert like you'd get for a GUID.

I didn't use a GUID here because that seemed overkill and a little
bit more expensive, but perhaps that was just premature optimization
on my part.

So, a new fixed width format like you suggested above would be fine.
I wonder though, if we're moving towards a stronger SID, there's no
reason to keep the PID in it.  Which makes me wonder about the value
of sha(hostname) too.  Perhaps, just make it a GUID or some combination
of the UTC date and a GUID ( <YYMMDDHHMMSS>-<microseconds>-<GUID> ) or
something like that.

If it helps, we can change how I'm reporting the SID between parent
and child processes, so that the SID field in the JSON events is
just the SID of the current process and have a peer field with the
SID-hierarchy.  This latter field would only need to be added to the
"version" or "start" event.  This might make post-processing a little
easier.  Not sure it matters one way or the other.

I'm open to suggestions here.

Jeff