"brian m. carlson" <sandals@xxxxxxxxxxxxxxxxxxxx> writes: Hello brian, > Hi, > > I noticed that Git 2.48 has support for migrating refs when there are > reflogs and, as promised at Git Merge, I decided to try it out. > Unfortunately, I got an error: > Thanks for trying it out and reporting. > > ---- > % git refs migrate --ref-format=reftable --dry-run > error: reftable: transaction failure: corrupt reftable file > ---- > > Here's a small reproduction case: > > ---- > #!/bin/sh > > rm -fr test-repo > git init -b dev test-repo > cd test-repo > > # start first block > touch foo.txt > git add foo.txt > git commit -m + > > head=$(git rev-parse HEAD) > seq 5000 | sed -Ee "s!^(.*)\$!create refs/heads/ref-\1 $head!" | git update-ref --stdin > # end first block > > # start second block > echo abc >bar.txt > git add bar.txt > git commit -m + > head=$(git rev-parse HEAD) > seq 3000 | sed -Ee "s!^(.*)\$!update refs/heads/ref-\1 $head!" | git update-ref --stdin > # end second block > > git refs migrate --ref-format=reftable > ---- > > I can also reproduce this on the latest master. > I can also reproduce it using the script provided here. > > If you remove the second block, it does not appear to reproduce. Some > investigation led me to the conclusion that the difference is when > max_update_index is not 1, the header has the value 1 for it but the > trailer has the correct value, and so we flag the header and trailer as > mismatching and therefore it gets marked as corrupt. I believe the > reason things work when removing the second block is because that value > remains 1, and so it works. > Indeed, you're absolutely correct here, this is because the `max_update_index` is indeed different between the footer and the header. This flow is triggered when we have multiple reftable blocks being written with multiple reflog entries. The multiple blocks is crucial because we update the `max_update_index` right before writing the reflogs. So if there are refs being written and they spawn more than one block, the header is written without the updated `max_update_index`, causing the mismatch. > I haven't done anything else to investigate here, for which I apologize, > but I just wanted to mention it while it was fresh on my mind. > I think you've provided sufficient information and also enough to make it easy for me to debug. > In case this is helpful, I did see this when attempting to migrate two > work repositories with lots of reflogs and many refs (the smaller has > 2983 and the larger, 44832). I obviously cannot send you these > repositories or things in them, but I'm happy to test patches against > them. > I'm attaching a patch below which should fixes the issue for me and also adding a test to test against the same. I'd be grateful if you could also test the patch against the repositoryies you mention. > Please let me know if I can provide more useful information. > -- > brian m. carlson (they/them or he/him) > Toronto, Ontario, CA -- >8 -- Subject: [PATCH] reftable: write correct max_update_index to header In 297c09eabb (refs: allow multiple reflog entries for the same refname, 2024-12-16), the reftable backend learned to handle multiple reflog entries within the same transaction. This was done modifying the `update_index` for reflogs with multiple indices. During writing the logs, the `max_update_index` of the writer was modified to ensure the limits were raised to the modified `update_index`s. However, since ref entries are written before the modification to the `max_update_index`, if there are multiple blocks to be written, the reftable backend writes the header with the old `max_update_index`. When all logs are finally written, the footer will be written with the new `min_update_index`. This causes a mismatch between the header and the footer and causes the reftable file to be corrupted. The existing tests only spawn a single block and since headers are lazily written with the first block, the tests didn't capture this bug. To fix the issue, the appropriate `max_update_index` limit must be set even before the first block is written. Add a `max_index` field to the transaction which holds the `max_index` within all its updates, then propagate this value to the reftable backend, wherein this is used to the set the `max_update_index` correctly. Add a test which creates a few thousand reference updates with multiple reflog entries, which should trigger the bug. Reported-by: brian m. carlson <sandals@xxxxxxxxxxxxxxxxxxxx> Signed-off-by: Karthik Nayak <karthik.188@xxxxxxxxx> --- refs.c | 7 +++++++ refs/refs-internal.h | 1 + refs/reftable-backend.c | 20 ++++++++++---------- t/t1460-refs-migrate.sh | 12 ++++++++++++ 4 files changed, 30 insertions(+), 10 deletions(-) diff --git a/refs.c b/refs.c index 0f41b2fd4a..f7b6f0f897 100644 --- a/refs.c +++ b/refs.c @@ -1345,6 +1345,13 @@ int ref_transaction_update_reflog(struct ref_transaction *transaction, update->flags &= ~REF_HAVE_OLD; update->index = index; + /* + * Reference backends may need to know the max index to optimize + * their writes. So we store the max_index on the transaction level. + */ + if (index > transaction->max_index) + transaction->max_index = index; + return 0; } diff --git a/refs/refs-internal.h b/refs/refs-internal.h index 16550862d3..aaab711bb9 100644 --- a/refs/refs-internal.h +++ b/refs/refs-internal.h @@ -203,6 +203,7 @@ struct ref_transaction { enum ref_transaction_state state; void *backend_data; unsigned int flags; + unsigned int max_index; }; /* diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c index 00d95a9a2f..289496058e 100644 --- a/refs/reftable-backend.c +++ b/refs/reftable-backend.c @@ -942,6 +942,7 @@ struct write_transaction_table_arg { size_t updates_nr; size_t updates_alloc; size_t updates_expected; + unsigned int max_index; }; struct reftable_transaction_data { @@ -1428,7 +1429,6 @@ static int write_transaction_table(struct reftable_writer *writer, void *cb_data struct reftable_log_record *logs = NULL; struct ident_split committer_ident = {0}; size_t logs_nr = 0, logs_alloc = 0, i; - uint64_t max_update_index = ts; const char *committer_info; int ret = 0; @@ -1438,7 +1438,12 @@ static int write_transaction_table(struct reftable_writer *writer, void *cb_data QSORT(arg->updates, arg->updates_nr, transaction_update_cmp); - reftable_writer_set_limits(writer, ts, ts); + /* + * During reflog migration, we add indexes for a single reflog with + * multiple entries. Each entry will contain a different update_index, + * so set the limits accordingly. + */ + reftable_writer_set_limits(writer, ts, ts + arg->max_index); for (i = 0; i < arg->updates_nr; i++) { struct reftable_transaction_update *tx_update = &arg->updates[i]; @@ -1540,12 +1545,6 @@ static int write_transaction_table(struct reftable_writer *writer, void *cb_data */ log->update_index = ts + u->index; - /* - * Note the max update_index so the limit can be set later on. - */ - if (log->update_index > max_update_index) - max_update_index = log->update_index; - log->refname = xstrdup(u->refname); memcpy(log->value.update.new_hash, u->new_oid.hash, GIT_MAX_RAWSZ); @@ -1609,8 +1608,6 @@ static int write_transaction_table(struct reftable_writer *writer, void *cb_data * and log blocks. */ if (logs) { - reftable_writer_set_limits(writer, ts, max_update_index); - ret = reftable_writer_add_logs(writer, logs, logs_nr); if (ret < 0) goto done; @@ -1631,6 +1628,9 @@ static int reftable_be_transaction_finish(struct ref_store *ref_store UNUSED, struct reftable_transaction_data *tx_data = transaction->backend_data; int ret = 0; + if (tx_data->args) + tx_data->args->max_index = transaction->max_index; + for (size_t i = 0; i < tx_data->args_nr; i++) { ret = reftable_addition_add(tx_data->args[i].addition, write_transaction_table, &tx_data->args[i]); diff --git a/t/t1460-refs-migrate.sh b/t/t1460-refs-migrate.sh index f59bc4860f..307b2998ef 100755 --- a/t/t1460-refs-migrate.sh +++ b/t/t1460-refs-migrate.sh @@ -227,6 +227,18 @@ do done done +test_expect_success 'multiple reftable blocks with multiple entries' ' + test_when_finished "rm -rf repo" && + git init --ref-format=files repo && + test_commit -C repo first && + printf "create refs/heads/ref-%d HEAD\n" $(test_seq 5000) >stdin && + git -C repo update-ref --stdin <stdin && + test_commit -C repo second && + printf "update refs/heads/ref-%d HEAD\n" $(test_seq 3000) >stdin && + git -C repo update-ref --stdin <stdin && + test_migration repo reftable +' + test_expect_success 'migrating from files format deletes backend files' ' test_when_finished "rm -rf repo" && git init --ref-format=files repo && -- 2.47.0
Attachment:
signature.asc
Description: PGP signature