Re: [PATCH v2 01/11] docs: new capability to advertise trace2 SIDs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 11/12/20 12:32 PM, Junio C Hamano wrote:
Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> writes:

AFAICT the way it's documented now is the "is the session-id[...]"
paragraph in api-trace2.txt.

I'd like to see us document the implementation a bit better and
explicitly support the "hack" of setting GIT_TRACE2_PARENT_SID=hello.

I've occasionally used that hack for control/experiment-type
testing, but not that often.

I was more pointing out that I had to use that environment
inheritance mechanism so that child processes can be associated
with their Git process ancestry.  And so it is possible for someone
to abuse that mechanism for other purposes (and introduce injections
into what Josh is proposing).


I.e. maybe I've missed something but we just say "session-id is
prepended with the session-id of the parent" but don't mention that we
separate them with slashes, so you can split on that to get the depth &
individual ID's.

My reading of the updated doc patch in v3 is that not allowing
"non-printable or whitespace" allows you to e.g. have slashes in those
custom session IDs, which would be quite inconvenient since it would
break that property.

A few things I want to see stakeholders agree on:

  - In "a/b/c", what's the "session ID" of the leaf child process?
    "a/b/c"?  or "c"?  If the former (which is what I am guessing),
    do we have name to refer to "b" or "c" alone (if not, we should
    have one)?

I consider a process' SID to be the complete "a/b/c" string.
But I do know that when I look at my logging data, that I will
also find data for a process with SID of "a" and data for another
process with SID "a/b".

So perhaps we should have names for:

    [1] "a/b/c"  -- my process' complete SID name
    [2] "c"      -- my process' SID component name
    [3] "a/b"    -- my parent's complete SID name


  - Do we encourage/force other implementations of Git protocol to
    adopt a similar "slash-separated non-whitespace ASCII printable"
    structure?  I do not think such a structure is too limiting but
    others may feel differently.  Or is a "session ID" supposed to be
    an opaque token without any structure, and upon seeing "a/b/c"
    the recipient should not read anything into its slash, or any
    relation  with another session whose ID is "a/b/d"?

When analyzing Git perf data, there are times when you basically want
to "SELECT * where SID startswith 'a/b/' ..." and summarize over the
child processes of "a/b".  So data from "a/b/c" and "a/b/d" would be
aggregated.  (I do have some of that data in the "child_exit" events
reported for the "a/b" process, but sometimes you need to actually
see the records for the child processes.)

So I guess I'm saying that the hierarchy has been useful and we should
try to keep it as is.


And we should explicitly support the GIT_TRACE2_PARENT_SID=* setting
from an external process, and make the SID definition loose enough to
allow for SIDs that don't look like Git's in that chain. I.e. a useful
property (and one I've seen in the wild) is to have some external
programt that already has SIDs/UUID run IDs spawn git, setting
GIT_TRACE2_PARENT_SID=<that program's SID> makes things convenient for
the purposes of logging.n

Yes, it can be useful for external tools that drive Git to be able to
set a SID prefix to help track that into Git process.

Likewise, it would be nice to add code to some of the Git shell script
commands (such as git-mergetool and git-prompt) to augment the SID
being passed to child Git commands to help track why they are being
invoked.

Jeff






[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux