Re: [PATCH v2] refs: fix creation of corrupted reflogs for symrefs

Junio C Hamano <gitster@xxxxxxxxx> · Thu, 23 Jan 2025 09:55:49 -0800

Karthik Nayak <karthik.188@xxxxxxxxx> writes:

> Subject: Re: [PATCH v2] refs: fix creation of corrupted reflogs for symrefs

This may be just me, but every time I see the above title, it read
to me as if we are on purpose doing "creation of corrupted reflogs
for symrefs", but we are failing to do so for some reason, and this
commit is about improving the situation so that we can correctly
create corrupted reflog entries for symbolic ref updates.

And because the only sensible reason why we may on purpose do
"creation of corrupted reflogs" I can think of is perhaps we prepare
such corrupted thing to test how robust the production code is when
seeing such corrupted data, I would expect to see a change to t/
hierarchy.

But the patch touches the code, not just tests, which makes me
doubly puzzled.

It happens every time I see this title and the change.  Perhaps drop
"corrupted" from the title?

> The commit 297c09eabb (refs: allow multiple reflog entries for the same
> refname, 2024-12-16) added logic for reflogs to exit early in
> `lock_ref_for_update()` after obtaining the required lock. This was

I do not think the actor, who "exits early", is not "reflogs".
Should we have "for reflogs" in the above, or perhaps move it to the
end of the sentence (i.e. the required lock gets obtained because we
want to do some operation "for reflogs")?

> added as a performance optimization as it was assumed that no further
> processing was required for reflog-only updates. However this was
> incorrect since for a symref's reflog entry, the update needs to be
> populated with the old_oid value. This is done right after the early
> exit.

"The early exit skipped this required work"?

> This caused a bug in Git 2.48 in the files backend where target
> references of symrefs being updated would create a corrupted reflog
> entry for the symref since the old_oid is not populated. Undo the skip
> in logic to fix this issue and also add a test to ensure that such an
> issue doesn't arise in the future.

OK.

> The early exit was added as a performance optimization for reflog-only
> updates, and it wasn't essential to the original changes. As such,
> reverting it shouldn't cause any further issues.

I am not sure if this is even worth saying, as you already said that
the early return was done incorrectly assuming that the remainder of
the function can be skipped as an optimization.  What may help
readers is to state that all the rest of the code path that was
skipped by a mistaken optimization is necessary and would not do
anything unwanted.

> Reported-by: Nika Layzell <nika@xxxxxxxxxxxxxxx>
> Co-authored-by: Jeff King <peff@xxxxxxxx>
> Signed-off-by: Jeff King <peff@xxxxxxxx>
> Signed-off-by: Karthik Nayak <karthik.188@xxxxxxxxx>

> diff --git a/t/t1400-update-ref.sh b/t/t1400-update-ref.sh
> index e2316f1dd4..29045aad43 100755
> --- a/t/t1400-update-ref.sh
> +++ b/t/t1400-update-ref.sh
> @@ -2068,4 +2068,13 @@ do
>  
>  done
>  
> +test_expect_success 'update-ref should also create reflog for HEAD' '
> +	test_commit to-rewind &&
> +	git rev-parse HEAD >expect &&
> +	head=$(git symbolic-ref HEAD) &&
> +	git update-ref --create-reflog "$head" HEAD~ &&
> +	git rev-parse HEAD@{1} >actual &&
> +	test_cmp expect actual
> +'

Nice.  We could rename "head" to something more meaningful (like
"current branch") but I can live with the above version.  It is much
nicer than assuming on what branch we would be, which was what the
previous iteration did.

Thanks.