Re: Bug in 2.48 with `git refs migrate`

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



"brian m. carlson" <sandals@xxxxxxxxxxxxxxxxxxxx> writes:

Hello brian,

> Hi,
>
> I noticed that Git 2.48 has support for migrating refs when there are
> reflogs and, as promised at Git Merge, I decided to try it out.
> Unfortunately, I got an error:
>

Thanks for trying it out and reporting.

>
> ----
> % git refs migrate --ref-format=reftable --dry-run
> error: reftable: transaction failure: corrupt reftable file
> ----
>
> Here's a small reproduction case:
>
> ----
> #!/bin/sh
>
> rm -fr test-repo
> git init -b dev test-repo
> cd test-repo
>
> # start first block
> touch foo.txt
> git add foo.txt
> git commit -m +
>
> head=$(git rev-parse HEAD)
> seq 5000 | sed -Ee "s!^(.*)\$!create refs/heads/ref-\1 $head!" | git update-ref --stdin
> # end first block
>
> # start second block
> echo abc >bar.txt
> git add bar.txt
> git commit -m +
> head=$(git rev-parse HEAD)
> seq 3000 | sed -Ee "s!^(.*)\$!update refs/heads/ref-\1 $head!" | git update-ref --stdin
> # end second block
>
> git refs migrate --ref-format=reftable
> ----
>
> I can also reproduce this on the latest master.
>

I can also reproduce it using the script provided here.

>
> If you remove the second block, it does not appear to reproduce.  Some
> investigation led me to the conclusion that the difference is when
> max_update_index is not 1, the header has the value 1 for it but the
> trailer has the correct value, and so we flag the header and trailer as
> mismatching and therefore it gets marked as corrupt.  I believe the
> reason things work when removing the second block is because that value
> remains 1, and so it works.
>

Indeed, you're absolutely correct here, this is because the
`max_update_index` is indeed different between the footer and the
header. This flow is triggered when we have multiple reftable blocks
being written with multiple reflog entries. The multiple blocks is
crucial because we update the `max_update_index` right before writing
the reflogs. So if there are refs being written and they spawn more than
one block, the header is written without the updated `max_update_index`,
causing the mismatch.

> I haven't done anything else to investigate here, for which I apologize,
> but I just wanted to mention it while it was fresh on my mind.
>

I think you've provided sufficient information and also enough to make
it easy for me to debug.

> In case this is helpful, I did see this when attempting to migrate two
> work repositories with lots of reflogs and many refs (the smaller has
> 2983 and the larger, 44832).  I obviously cannot send you these
> repositories or things in them, but I'm happy to test patches against
> them.
>

I'm attaching a patch below which should fixes the issue for me and also
adding a test to test against the same. I'd be grateful if you could
also test the patch against the repositoryies you mention.

> Please let me know if I can provide more useful information.
> --
> brian m. carlson (they/them or he/him)
> Toronto, Ontario, CA

-- >8 --

Subject: [PATCH] reftable: write correct max_update_index to header

In 297c09eabb (refs: allow multiple reflog entries for the same refname,
2024-12-16), the reftable backend learned to handle multiple reflog
entries within the same transaction. This was done modifying the
`update_index` for reflogs with multiple indices. During writing the
logs, the `max_update_index` of the writer was modified to ensure the
limits were raised to the modified `update_index`s.

However, since ref entries are written before the modification to the
`max_update_index`, if there are multiple blocks to be written, the
reftable backend writes the header with the old `max_update_index`. When
all logs are finally written, the footer will be written with the new
`min_update_index`. This causes a mismatch between the header and the
footer and causes the reftable file to be corrupted. The existing tests
only spawn a single block and since headers are lazily written with the
first block, the tests didn't capture this bug.

To fix the issue, the appropriate `max_update_index` limit must be set
even before the first block is written. Add a `max_index` field to the
transaction which holds the `max_index` within all its updates, then
propagate this value to the reftable backend, wherein this is used to
the set the `max_update_index` correctly.

Add a test which creates a few thousand reference updates with multiple
reflog entries, which should trigger the bug.

Reported-by: brian m. carlson <sandals@xxxxxxxxxxxxxxxxxxxx>
Signed-off-by: Karthik Nayak <karthik.188@xxxxxxxxx>
---
 refs.c                  |  7 +++++++
 refs/refs-internal.h    |  1 +
 refs/reftable-backend.c | 20 ++++++++++----------
 t/t1460-refs-migrate.sh | 12 ++++++++++++
 4 files changed, 30 insertions(+), 10 deletions(-)

diff --git a/refs.c b/refs.c
index 0f41b2fd4a..f7b6f0f897 100644
--- a/refs.c
+++ b/refs.c
@@ -1345,6 +1345,13 @@ int ref_transaction_update_reflog(struct
ref_transaction *transaction,
 	update->flags &= ~REF_HAVE_OLD;
 	update->index = index;

+	/*
+	 * Reference backends may need to know the max index to optimize
+	 * their writes. So we store the max_index on the transaction level.
+	 */
+	if (index > transaction->max_index)
+		transaction->max_index = index;
+
 	return 0;
 }

diff --git a/refs/refs-internal.h b/refs/refs-internal.h
index 16550862d3..aaab711bb9 100644
--- a/refs/refs-internal.h
+++ b/refs/refs-internal.h
@@ -203,6 +203,7 @@ struct ref_transaction {
 	enum ref_transaction_state state;
 	void *backend_data;
 	unsigned int flags;
+	unsigned int max_index;
 };

 /*
diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
index 00d95a9a2f..289496058e 100644
--- a/refs/reftable-backend.c
+++ b/refs/reftable-backend.c
@@ -942,6 +942,7 @@ struct write_transaction_table_arg {
 	size_t updates_nr;
 	size_t updates_alloc;
 	size_t updates_expected;
+	unsigned int max_index;
 };

 struct reftable_transaction_data {
@@ -1428,7 +1429,6 @@ static int write_transaction_table(struct
reftable_writer *writer, void *cb_data
 	struct reftable_log_record *logs = NULL;
 	struct ident_split committer_ident = {0};
 	size_t logs_nr = 0, logs_alloc = 0, i;
-	uint64_t max_update_index = ts;
 	const char *committer_info;
 	int ret = 0;

@@ -1438,7 +1438,12 @@ static int write_transaction_table(struct
reftable_writer *writer, void *cb_data

 	QSORT(arg->updates, arg->updates_nr, transaction_update_cmp);

-	reftable_writer_set_limits(writer, ts, ts);
+	/*
+	 * During reflog migration, we add indexes for a single reflog with
+	 * multiple entries. Each entry will contain a different update_index,
+	 * so set the limits accordingly.
+	 */
+	reftable_writer_set_limits(writer, ts, ts + arg->max_index);

 	for (i = 0; i < arg->updates_nr; i++) {
 		struct reftable_transaction_update *tx_update = &arg->updates[i];
@@ -1540,12 +1545,6 @@ static int write_transaction_table(struct
reftable_writer *writer, void *cb_data
 				 */
 				log->update_index = ts + u->index;

-				/*
-				 * Note the max update_index so the limit can be set later on.
-				 */
-				if (log->update_index > max_update_index)
-					max_update_index = log->update_index;
-
 				log->refname = xstrdup(u->refname);
 				memcpy(log->value.update.new_hash,
 				       u->new_oid.hash, GIT_MAX_RAWSZ);
@@ -1609,8 +1608,6 @@ static int write_transaction_table(struct
reftable_writer *writer, void *cb_data
 	 * and log blocks.
 	 */
 	if (logs) {
-		reftable_writer_set_limits(writer, ts, max_update_index);
-
 		ret = reftable_writer_add_logs(writer, logs, logs_nr);
 		if (ret < 0)
 			goto done;
@@ -1631,6 +1628,9 @@ static int reftable_be_transaction_finish(struct
ref_store *ref_store UNUSED,
 	struct reftable_transaction_data *tx_data = transaction->backend_data;
 	int ret = 0;

+	if (tx_data->args)
+		tx_data->args->max_index = transaction->max_index;
+
 	for (size_t i = 0; i < tx_data->args_nr; i++) {
 		ret = reftable_addition_add(tx_data->args[i].addition,
 					    write_transaction_table, &tx_data->args[i]);
diff --git a/t/t1460-refs-migrate.sh b/t/t1460-refs-migrate.sh
index f59bc4860f..307b2998ef 100755
--- a/t/t1460-refs-migrate.sh
+++ b/t/t1460-refs-migrate.sh
@@ -227,6 +227,18 @@ do
 	done
 done

+test_expect_success 'multiple reftable blocks with multiple entries' '
+	test_when_finished "rm -rf repo" &&
+	git init --ref-format=files repo &&
+	test_commit -C repo first &&
+	printf "create refs/heads/ref-%d HEAD\n" $(test_seq 5000) >stdin &&
+	git -C repo update-ref --stdin <stdin &&
+	test_commit -C repo second &&
+	printf "update refs/heads/ref-%d HEAD\n" $(test_seq 3000) >stdin &&
+	git -C repo update-ref --stdin <stdin &&
+	test_migration repo reftable
+'
+
 test_expect_success 'migrating from files format deletes backend files' '
 	test_when_finished "rm -rf repo" &&
 	git init --ref-format=files repo &&
-- 
2.47.0

Attachment: signature.asc
Description: PGP signature


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux