Re: Usability issue: "Your branch is up to date"

Junio C Hamano <gitster@xxxxxxxxx> · Tue, 04 Feb 2025 09:43:10 -0800

Manuel Quiñones <manuel.por.aca@xxxxxxxxx> writes:

> Thanks for the insightful explanation Junio! Looking forward, do you
> think that it could be possible to record the timestamp that the
> remote-tracking branch has been updated with the remote branch? In
> order to make such information available to the end user.

The time at which each remote-tracking branch was updated is already
recorded in the reflog.  What is missing is the timestamp that a
fetch checked if a remote-tracking branch needs updating, found that
the branch at the remote hasn't changed, and did not update the
remote-tracking branch.

You'd need to first design where to store that information and how.

It does not have to be in the reflog, but as a thought experiment,
let's take how the design would go if we decided to use reflog to
store that information.

What a reflog entry records, in textual form, looks like

<old-object-name> <new-object-name> <user-ident> <timestamp> <comment>

We can imagine adding a new reflog entry whenever "git fetch" finds
that the branch at the remote hasn't been updated, with the same
value in <old-object-name> and <new-object-name>.

A reflog file I randomly picked as a sample is ~5k long with 34
entries (it keeps track of my fetching from and pushing to
https://git.kernel.org/pub/scm/git/git.git/#master), so a reflog
costs around 150 bytes per entry, and if you fetch once every hour
that would be like ~3k per branch per day.

While that is a trivial and insignificant number from storage cost
point of view, if you are monitoring the progress of the remote with
"git reflog origin/main", I suspect that such a change would make it
unusably noisy, so "git reflog" command may need to grow an option
that tells it to skip these no-op entries.

As to required change to "git fetch", this may be a bit tricky.

IIRC (I am writing from the memory without looking at the code),
when you say "git fetch [<remote> [<refspec>...]]", what it does
is roughly to:

 - figure out what <remote> and <refspec>... to use from the
   configuration, if omitted on the command line.

 - connect to the remote, and ask the current value of their refs.

 - drop any refspec <src>:<dst> whose <dst> side already has the
   value the remote has.

 - drive the object transfer machinery to receive the pack data from
   the remote and store it locally.

 - update the remote-tracking branches.

And the last step is where the remote-tracking branches are updated,
together with their reflog (if enabled).  Because that step does not
even see the remote-tracking branches whose value do not need to
change (filtered out earlier to help reduce the number of refs fed
to the object transfer machinery), the "drop no-op early" part need
to be designed differently (e.g. mark them as no-op, so that the 
object tranfer machinery can notice them and ignore) and then the
"update refs" step can see these no-op updates.

I do not think writing the "no-op" reflog entries should be done at
a step separate from the step that writes the real ref updates, as I
suspect that such a separate update scheme would have a funny
interactions with "git fetch --atomic".

So, do I think it could be possible?  Sure.  Do I think it would be
too hard as a rocket surgery?  No.  Will I jump up and down excited
and start coding?  I am not interested all that much, but I can help
reviewing patches if somebody else works on it.

There may be some other downsides (other than the cost of storage
and making the reflog noisy) I haven't thought about, which need to
be considered if somebody decides to work on this.

Thanks.