Re: [idea] File history tracking hints

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 10/2/2017 1:41 PM, Stefan Beller wrote:
It would be nice if every file (and tree) had a permanent GUID
associated with it.  Then the filename/pathname becomes a property
of the GUIDs.  Then you can exactly know about moves/renames with
minimal effort (and no guessing).

...

https://public-inbox.org/git/Pine.LNX.4.58.0504150753440.7211@xxxxxxxxxxxxxxx/

I'd encourge people to read and re-read that message until they can
recite it by heart.

I have rethought about the idea of GUIDs as proposed by Jeff and wanted
to give a reply. After rereading this message, I think my thoughts are
already included via:

   - you're doing the work at the wrong point for _another_ reason. You're
      freezing your (crappy) algorithm at tree creation time, and basically
      making it pointless to ever create something better later, because even
      if hardware and software improves, you've codified that "we have to
      have crappy information".

--
My design proposal for these "rename hints" would be a special trailer,
roughly:

     Rename: LICENSE -> legal.txt
     Rename: t/* -> tests/*

or more generally:

     Rename: <pathspec> <delim> <pathspec>

This however has multiple issues due to potential
human inaccuracies:
(A) typos in the trailer key or in the pathspec
    (resulting in different error modes)
(B) partial hints (We currently have a world of
    completely missing hints, so I would not expect it to
    be worse?)
(C) wrong hints. This ought to be no problem as Git would
    take some CPU time to conclude the hint was bogus.

For (A), I would imagine we want a mechanism (e.g. notes)
to "correct" the hints. This is the similar issue as a typo in a
commit message, which we currently just ignore if the
commit has been merged to e.g. master.

So maybe we'd just design around that, giving the option
to give the correct hints via command line.

So if the commit has the typo'd hint

     Remame:  t/* -> tests/*

the human would see that (and also conclude that by
the commit message), and then invoke

git log -C -C-hint="t/* -> tests/*" ...

which would have the corrected hint and hence deliver
the best output.

Maybe the "-C-hint" flag is the best starting point when
going in that direction?

Thanks,
Stefan


Sorry to re-re-...-re-stir up such an old topic.

I wasn't really thinking about commit-to-commit hints.
I think these have lots of problems.  (If commit A->B does
"t/* -> tests/*" and commit B->C does "test/*.c -> xyx/*",
then you need a way to compute a transitive closure to see
the net-net hints for A->C.  I think that quickly spirals
out of control.)

No, I was going in another direction.  For example, if a
tree-entry contains { file-guid, file-name, file-sha, ... }
then when diffing any 2 commits, you can match up files
(and folders) by their guids.  Renames pop out trivially when
their file-names don't match.  File moves pop out when the
file-guids appear in different trees.  Adds and deletes pop
out when file-guids don't have a peer. (I'm glossing over some
of the details, but you get the idea.)  To address Junio's
question, independently added files with the same name will
have 2 different file-guids.  We amend the merge rules to
handle this case and pick one of them (say, the one that
is sorts less than the other) as the winner and go on.
All-in-all the solution is not trivial (as there are a few
edge cases to deal with), but it better matches the (casual)
user's perception of what happened to their tree over time.
It also doesn't require expensive code to sniff for renames
on every command (which doesn't scale on really large repos).

But as I said before, that ship has passed...
Jeff



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux