Re: refs/notes/amlog problems, was Re: [PATCH v3 01/20] linear-assignment: a function to solve least-cost assignment problems

Jeff King <peff@xxxxxxxx> · Tue, 24 Jul 2018 05:45:01 -0400

On Mon, Jul 23, 2018 at 06:50:46PM -0700, Junio C Hamano wrote:

> 	Side Note: there are a few workflow elements I do want to
> 	keep using but they currently *lose* the mapping info.  An
> 	obvious one is
> 
> 	  $ git checkout -b to/pic master &&
> 	  ... review in MUA and then ...
> 	  $ git am -s mbox &&
> 	  ... review in tree, attempt to build, tweak, etc.
>           $ git format-patch --stdout master..to/pic >P &&
>           $ edit P &&
>           $ git reset --hard master &&
>           $ git am P
> 
> 	which is far more versatile and efficient when doing certain
> 	transformations on the series than running "rebase -i" and
> 	reopening and editing the target files of the patches one by
> 	one in each step.  But because format-patch does not
> 	generate Message-Id header of the original one out of the
> 	commit, the post-applypatch hook run by "am" at the end of
> 	the steps would not have a chance to record that for the
> 	newly created commit.
> 
> 	For this one, I think I can use "format-patch --notes=amlog"
> 	to produce the patch file and then teach post-applypatch
> 	script to pay attention to the Notes annotation without
> 	changing anything else to record the message id of the
> 	original.

Yes. I wonder if it would make sense to teach format-patch/am a
micro-format to automatically handle this case. I.e., some
machine-readable way of passing the notes in the email message.
Of course it's easy to design a format that covers the relatively
restricted form of these amlog notes, and much harder to cover the
general case.

>       Other workflow elements that lose the notes need
> 	to be identified and either a fix implemented or a
> 	workaround found for each of them.  For example, I suspect
> 	there is no workaround for "cherry-pick" and it would take a
> 	real fix.

I think the existing notes.rewriteRef is probably a good match here. I
can definitely think of notes you wouldn't want to cherry-pick, but I'm
having trouble coming up with an example that should survive a rebase
but not a cherry-pick.

> And these (incomplete) reverse mapping entries get in the way to
> maintain and correct the forward mapping.  When a commit that got
> unreachable gets expired, I want "git notes prune" to remove notes
> on them, and I do not want to even think about what should happen to
> the entries in the notes tree that abuse the mechanism to map blobs
> that are otherwise *not* even reachable from the main history.

Right, I think the notes tree is a poor distribution method for that
reason.

> > Though personally, I do not know if there is much point in pushing it
> > out, given that receivers can reverse the mapping themselves.
> 
> Before this thread, I was planning to construct and publish the
> reverse mapping at the end of the day, but do so on a separate notes
> ref (see above---the hacky abuse gets in the way of maintaining and
> debugging the forward mapping, but a separate notes-ref that only
> contains hacks is less worrysome).  But I have changed my mind and
> decided not to generate or publish one.  It is sort of similar to
> the way the pack .idx is constructed only by the receiver [*1*].

Yes, the pack .idx was the same mental model I had when writing my
earlier message.

> > Or is there some argument that there is information in the reverse map
> > that _cannot_ be generated from the forward map?
> 
> I know there is no information loss (after all I was the only one
> who ran that experimental hack), but there is one objection that is
> still possible, even though I admit that is a weak argument.

I wondered if you might have a case like this (building as we go):

 - message-id M becomes commit X
   - we write the forward map X->M
   - we write the reverse map M->X
 - during a rewrite (e.g., --amend), commit X becomes commit Y
   - we write the forward map Y->M
   - we write the reverse map M->Y

The difference between that result and an inverted map created at the
end is that we know that M->Y is the final result. Whereas by looking at
the inverted map, we do not know which of M->X and M->Y is correct. In
fact they are _both_ correct. But only one of X and Y would eventually
get merged (both may make it into the repo's of people fetching from you
if we imagine that X is on "pu" and you push between the two steps).

So I think the inverted mapping is not actually one-to-one, and in
either case you'd want to retain all possible matches (pruning only when
a commit is eventually dropped from the forward mapping, which rewritten
things from "pu" would eventually do). And in that case it does not
matter if you generate it incrementally or all at once.

> If a plumbing "diff-{files,tree,index}" family had a sibling
> "diff-notes" to compare two notes-shaped trees while pretending that
> the object-name fan-out did not exist (i.e. instead, the trees being
> compared is without a subtree and full of 40-hex filenames), then it
> would be less cumbersome to incrementally update the reverse mapping
> by reading forward mapping with something like:
> 
> 	git diff-notes --raw amlog@{1} amlog
> 
> to learn the commits whose notes have changed.  But without such a
> plumbing, it is cumbersome to do so correctly.  "git diff-tree -r"
> could serve as a rough substitute, until the note tree grows and get
> rebalanced by reorganizing the fan-out, and on the day it happens
> the reverse mapper needs to read and discard ghost changes that are
> only due to tree reorganizing [*2*].

Yeah. My "log" hackery was trying to do that incremental comparison, but
it did not handle the multiple-commit case (nor did it handle
deletions). I agree an end-point diff is sufficient (and more
efficient).

-Peff