[PATCH v2] reftable: write correct max_update_index to header

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



In 297c09eabb (refs: allow multiple reflog entries for the same refname,
2024-12-16), the reftable backend learned to handle multiple reflog
entries within the same transaction. This was done modifying the
`update_index` for reflogs with multiple indices. During writing the
logs, the `max_update_index` of the writer was modified to ensure the
limits were raised to the modified `update_index`s.

However, since ref entries are written before the modification to the
`max_update_index`, if there are multiple blocks to be written, the
reftable backend writes the header with the old `max_update_index`. When
all logs are finally written, the footer will be written with the new
`min_update_index`. This causes a mismatch between the header and the
footer and causes the reftable file to be corrupted. The existing tests
only spawn a single block and since headers are lazily written with the
first block, the tests didn't capture this bug.

To fix the issue, the appropriate `max_update_index` limit must be set
even before the first block is written. Add a `max_index` field to the
transaction which holds the `max_index` within all its updates, then
propagate this value to the reftable backend, wherein this is used to
the set the `max_update_index` correctly.

Add a test which creates a few thousand reference updates with multiple
reflog entries, which should trigger the bug.

Reported-by: brian m. carlson <sandals@xxxxxxxxxxxxxxxxxxxx>
Signed-off-by: Karthik Nayak <karthik.188@xxxxxxxxx>
---

Hello,

While this patch was merged to next, Dscho reported that it was flaky
on macos pipeline. On further investigation I found this was easily
reproducible when the leak sanitizer was turned on and the reftable
tests were run. The fix was simply to add the missing 0 initialization.

I also found that we didn't set the appropriate values for all elements
of the `reftable_transaction_data`. Which I've done now.

The patch is based on Maint f93ff170b9 (Git 2.48.1, 2025-01-13). 

Thanks

Changes from v1:
1. Correctly intialize the value to `0` for `write_transaction_table_arg.max_index`.
2. Set the `max_index` for all elements of `reftable_transaction_data`.
3. The range diff was made against the commit in Junio's branch
   `gitster/kn/reflog-migration-fix`, which included his 'S-o-b', which I've
   removed in this patch, since that is usually applied by Junio. Not sure if
   that was the right move. 

Range diff:
1:  bc67b4ab5f ! 1:  5489826b47 reftable: write correct max_update_index to header
    @@ Commit message
     
         Reported-by: brian m. carlson <sandals@xxxxxxxxxxxxxxxxxxxx>
         Signed-off-by: Karthik Nayak <karthik.188@xxxxxxxxx>
    -    Signed-off-by: Junio C Hamano <gitster@xxxxxxxxx>
     
      ## refs.c ##
     @@ refs.c: int ref_transaction_update_reflog(struct ref_transaction *transaction,
    @@ refs/reftable-backend.c: struct write_transaction_table_arg {
      };
      
      struct reftable_transaction_data {
    +@@ refs/reftable-backend.c: static int prepare_transaction_update(struct write_transaction_table_arg **out,
    + 		arg->updates_nr = 0;
    + 		arg->updates_alloc = 0;
    + 		arg->updates_expected = 0;
    ++		arg->max_index = 0;
    + 	}
    + 
    + 	arg->updates_expected++;
     @@ refs/reftable-backend.c: static int write_transaction_table(struct reftable_writer *writer, void *cb_data
      	struct reftable_log_record *logs = NULL;
      	struct ident_split committer_ident = {0};
    @@ refs/reftable-backend.c: static int write_transaction_table(struct reftable_writ
      		if (ret < 0)
      			goto done;
     @@ refs/reftable-backend.c: static int reftable_be_transaction_finish(struct ref_store *ref_store UNUSED,
    - 	struct reftable_transaction_data *tx_data = transaction->backend_data;
      	int ret = 0;
      
    -+	if (tx_data->args)
    -+		tx_data->args->max_index = transaction->max_index;
    -+
      	for (size_t i = 0; i < tx_data->args_nr; i++) {
    ++		tx_data->args[i].max_index = transaction->max_index;
    ++
      		ret = reftable_addition_add(tx_data->args[i].addition,
      					    write_transaction_table, &tx_data->args[i]);
    + 		if (ret < 0)
     
      ## t/t1460-refs-migrate.sh ##
     @@ t/t1460-refs-migrate.sh: do
---
 refs.c                  |  7 +++++++
 refs/refs-internal.h    |  1 +
 refs/reftable-backend.c | 20 ++++++++++----------
 t/t1460-refs-migrate.sh | 12 ++++++++++++
 4 files changed, 30 insertions(+), 10 deletions(-)

diff --git a/refs.c b/refs.c
index 0f41b2fd4a..f7b6f0f897 100644
--- a/refs.c
+++ b/refs.c
@@ -1345,6 +1345,13 @@ int ref_transaction_update_reflog(struct ref_transaction *transaction,
 	update->flags &= ~REF_HAVE_OLD;
 	update->index = index;
 
+	/*
+	 * Reference backends may need to know the max index to optimize
+	 * their writes. So we store the max_index on the transaction level.
+	 */
+	if (index > transaction->max_index)
+		transaction->max_index = index;
+
 	return 0;
 }
 
diff --git a/refs/refs-internal.h b/refs/refs-internal.h
index 16550862d3..aaab711bb9 100644
--- a/refs/refs-internal.h
+++ b/refs/refs-internal.h
@@ -203,6 +203,7 @@ struct ref_transaction {
 	enum ref_transaction_state state;
 	void *backend_data;
 	unsigned int flags;
+	unsigned int max_index;
 };
 
 /*
diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
index 00d95a9a2f..d39a14c5a4 100644
--- a/refs/reftable-backend.c
+++ b/refs/reftable-backend.c
@@ -942,6 +942,7 @@ struct write_transaction_table_arg {
 	size_t updates_nr;
 	size_t updates_alloc;
 	size_t updates_expected;
+	unsigned int max_index;
 };
 
 struct reftable_transaction_data {
@@ -1019,6 +1020,7 @@ static int prepare_transaction_update(struct write_transaction_table_arg **out,
 		arg->updates_nr = 0;
 		arg->updates_alloc = 0;
 		arg->updates_expected = 0;
+		arg->max_index = 0;
 	}
 
 	arg->updates_expected++;
@@ -1428,7 +1430,6 @@ static int write_transaction_table(struct reftable_writer *writer, void *cb_data
 	struct reftable_log_record *logs = NULL;
 	struct ident_split committer_ident = {0};
 	size_t logs_nr = 0, logs_alloc = 0, i;
-	uint64_t max_update_index = ts;
 	const char *committer_info;
 	int ret = 0;
 
@@ -1438,7 +1439,12 @@ static int write_transaction_table(struct reftable_writer *writer, void *cb_data
 
 	QSORT(arg->updates, arg->updates_nr, transaction_update_cmp);
 
-	reftable_writer_set_limits(writer, ts, ts);
+	/*
+	 * During reflog migration, we add indexes for a single reflog with
+	 * multiple entries. Each entry will contain a different update_index,
+	 * so set the limits accordingly.
+	 */
+	reftable_writer_set_limits(writer, ts, ts + arg->max_index);
 
 	for (i = 0; i < arg->updates_nr; i++) {
 		struct reftable_transaction_update *tx_update = &arg->updates[i];
@@ -1540,12 +1546,6 @@ static int write_transaction_table(struct reftable_writer *writer, void *cb_data
 				 */
 				log->update_index = ts + u->index;
 
-				/*
-				 * Note the max update_index so the limit can be set later on.
-				 */
-				if (log->update_index > max_update_index)
-					max_update_index = log->update_index;
-
 				log->refname = xstrdup(u->refname);
 				memcpy(log->value.update.new_hash,
 				       u->new_oid.hash, GIT_MAX_RAWSZ);
@@ -1609,8 +1609,6 @@ static int write_transaction_table(struct reftable_writer *writer, void *cb_data
 	 * and log blocks.
 	 */
 	if (logs) {
-		reftable_writer_set_limits(writer, ts, max_update_index);
-
 		ret = reftable_writer_add_logs(writer, logs, logs_nr);
 		if (ret < 0)
 			goto done;
@@ -1632,6 +1630,8 @@ static int reftable_be_transaction_finish(struct ref_store *ref_store UNUSED,
 	int ret = 0;
 
 	for (size_t i = 0; i < tx_data->args_nr; i++) {
+		tx_data->args[i].max_index = transaction->max_index;
+
 		ret = reftable_addition_add(tx_data->args[i].addition,
 					    write_transaction_table, &tx_data->args[i]);
 		if (ret < 0)
diff --git a/t/t1460-refs-migrate.sh b/t/t1460-refs-migrate.sh
index f59bc4860f..307b2998ef 100755
--- a/t/t1460-refs-migrate.sh
+++ b/t/t1460-refs-migrate.sh
@@ -227,6 +227,18 @@ do
 	done
 done
 
+test_expect_success 'multiple reftable blocks with multiple entries' '
+	test_when_finished "rm -rf repo" &&
+	git init --ref-format=files repo &&
+	test_commit -C repo first &&
+	printf "create refs/heads/ref-%d HEAD\n" $(test_seq 5000) >stdin &&
+	git -C repo update-ref --stdin <stdin &&
+	test_commit -C repo second &&
+	printf "update refs/heads/ref-%d HEAD\n" $(test_seq 3000) >stdin &&
+	git -C repo update-ref --stdin <stdin &&
+	test_migration repo reftable
+'
+
 test_expect_success 'migrating from files format deletes backend files' '
 	test_when_finished "rm -rf repo" &&
 	git init --ref-format=files repo &&
-- 
2.47.0





[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux