Hi, this is the third version of my patch series that implements ref storage format migrations. Changes compared to v2: - Perform sanity checks for worktrees and reflogs before we create the temporary refdb directory. - Swapped out calls to `remove_path()` to `unlink()`. We do not want to walk up and remove empty parent directories, even though this is harmless in practice. - Release the reftable refdb before removing it. This closes the cached "tables.list" file descriptor, which would otherwise break removal of this file on Windows. - Fix a bug with worktrees where we store the current worktree refdb twice. This caused us to keep file descriptors open, which breaks removal of the refdb on Windows. - Simplify freeing reftable's merged tables. This isn't really needed by this series, but I stumbled over this while investigating why things break on Windows. - Improve error messages to add `strerror(errno)`, which helped me in debugging those errors. - Print path to the migrated refs if things fail after we have populated them such that users can recover. - Fix segfault when releasing a partially initialized "files" ref store. - Some smallish improvements littered across the patches. Thanks! Patrick Patrick Steinhardt (12): setup: unset ref storage when reinitializing repository version refs: convert ref storage format to an enum refs: pass storage format to `ref_store_init()` explicitly refs: allow to skip creation of reflog entries refs/files: refactor `add_pseudoref_and_head_entries()` refs/files: extract function to iterate through root refs refs/files: fix NULL pointer deref when releasing ref store reftable: inline `merged_table_release()` worktree: don't store main worktree twice refs: implement removal of ref storages refs: implement logic to migrate between ref storage formats builtin/refs: new command to migrate ref storage formats .gitignore | 1 + Documentation/git-refs.txt | 62 +++++++ Makefile | 1 + builtin.h | 1 + builtin/clone.c | 2 +- builtin/init-db.c | 2 +- builtin/refs.c | 75 ++++++++ git.c | 1 + refs.c | 340 +++++++++++++++++++++++++++++++++++-- refs.h | 41 ++++- refs/files-backend.c | 123 ++++++++++++-- refs/packed-backend.c | 15 ++ refs/ref-cache.c | 2 + refs/refs-internal.h | 7 + refs/reftable-backend.c | 55 +++++- reftable/merged.c | 12 +- reftable/merged.h | 2 - reftable/stack.c | 8 +- repository.c | 3 +- repository.h | 10 +- setup.c | 10 +- setup.h | 9 +- t/helper/test-ref-store.c | 1 + t/t1460-refs-migrate.sh | 243 ++++++++++++++++++++++++++ worktree.c | 29 ++-- 25 files changed, 974 insertions(+), 81 deletions(-) create mode 100644 Documentation/git-refs.txt create mode 100644 builtin/refs.c create mode 100755 t/t1460-refs-migrate.sh Range-diff against v2: 1: 8b11127daf ! 1: afb705f6a0 setup: unset ref storage when reinitializing repository version @@ Commit message storages though, so this is about to become an issue there. Prepare for this and unset the ref storage format when reinitializing a - repoistory with the "files" format. + repository with the "files" format. Signed-off-by: Patrick Steinhardt <ps@xxxxxx> 2: 25f740f395 = 2: 7989e82dcd refs: convert ref storage format to an enum 3: 6e7b9764f6 = 3: 7d1a86292c refs: pass storage format to `ref_store_init()` explicitly 4: 03f4ac6ee7 = 4: d0539b7456 refs: allow to skip creation of reflog entries 5: 71f31fe66c = 5: 7f9ce5af2e refs/files: refactor `add_pseudoref_and_head_entries()` 6: 6b696690ca = 6: f7577a0ab3 refs/files: extract function to iterate through root refs -: ---------- > 7: 56baa798fb refs/files: fix NULL pointer deref when releasing ref store -: ---------- > 8: c7e8ab40b5 reftable: inline `merged_table_release()` -: ---------- > 9: 7a89aae515 worktree: don't store main worktree twice 7: b758c419c6 ! 10: f9d9420cf9 refs: implement removal of ref storages @@ refs/files-backend.c: static int files_ref_store_create_on_disk(struct ref_store + + strbuf_addf(&buf, "%s/%s", data->gitdir, refname); + -+ ret = remove_path(buf.buf); ++ ret = unlink(buf.buf); + if (ret < 0) + strbuf_addf(data->err, "could not delete %s: %s\n", + refname, strerror(errno)); @@ refs/files-backend.c: static int files_ref_store_create_on_disk(struct ref_store + + strbuf_addf(&sb, "%s/refs", refs->base.gitdir); + if (remove_dir_recursively(&sb, 0) < 0) { -+ strbuf_addstr(err, "could not delete refs"); ++ strbuf_addf(err, "could not delete refs: %s", ++ strerror(errno)); + ret = -1; + } + strbuf_reset(&sb); + + strbuf_addf(&sb, "%s/logs", refs->base.gitdir); + if (remove_dir_recursively(&sb, 0) < 0) { -+ strbuf_addstr(err, "could not delete logs\n"); ++ strbuf_addf(err, "could not delete logs: %s", ++ strerror(errno)); + ret = -1; + } + strbuf_reset(&sb); @@ refs/reftable-backend.c: static int reftable_be_create_on_disk(struct ref_store + struct strbuf sb = STRBUF_INIT; + int ret = 0; + ++ /* ++ * Release the ref store such that all stacks are closed. This is ++ * required so that the "tables.list" file is not open anymore, which ++ * would otherwise make it impossible to remove the file on Windows. ++ */ ++ reftable_be_release(ref_store); ++ + strbuf_addf(&sb, "%s/reftable", refs->base.gitdir); + if (remove_dir_recursively(&sb, 0) < 0) { -+ strbuf_addstr(err, "could not delete reftables"); ++ strbuf_addf(err, "could not delete reftables: %s", ++ strerror(errno)); + ret = -1; + } + strbuf_reset(&sb); + + strbuf_addf(&sb, "%s/HEAD", refs->base.gitdir); -+ if (remove_path(sb.buf) < 0) { -+ strbuf_addstr(err, "could not delete stub HEAD"); ++ if (unlink(sb.buf) < 0) { ++ strbuf_addf(err, "could not delete stub HEAD: %s", ++ strerror(errno)); + ret = -1; + } + strbuf_reset(&sb); + + strbuf_addf(&sb, "%s/refs/heads", refs->base.gitdir); -+ if (remove_path(sb.buf) < 0) { -+ strbuf_addstr(err, "could not delete stub heads"); ++ if (unlink(sb.buf) < 0) { ++ strbuf_addf(err, "could not delete stub heads: %s", ++ strerror(errno)); ++ ret = -1; ++ } ++ strbuf_reset(&sb); ++ ++ strbuf_addf(&sb, "%s/refs", refs->base.gitdir); ++ if (rmdir(sb.buf) < 0) { ++ strbuf_addf(err, "could not delete stub heads: %s", ++ strerror(errno)); + ret = -1; + } + 8: 4d3eb5ea89 ! 11: 1f26051eff refs: implement logic to migrate between ref storage formats @@ Commit message many reflog entries. - We do not lock the repository for concurrent access, and thus - concurrent writes may make use end up with weird in-between states. - There is no way to fully lock the "files" backend for writes due to - its format, and thus we punt on this topic altogether and defer to - the user to avoid those from happening. + concurrent writes may end up with weird in-between states. There is + no way to fully lock the "files" backend for writes due to its + format, and thus we punt on this topic altogether and defer to the + user to avoid those from happening. In other words, this version is a minimum viable product for migrating a repository's ref storage format. It works alright for bare repos, which @@ refs.c: int ref_update_check_old_target(const char *referent, struct ref_update + +struct migration_data { + struct ref_store *old_refs; -+ struct ref_store *new_refs; + struct ref_transaction *transaction; + struct strbuf *errbuf; -+ const char *refname; +}; + +static int migrate_one_ref(const char *refname, const struct object_id *oid, @@ refs.c: int ref_update_check_old_target(const char *referent, struct ref_update + + from_dir = opendir(from_path); + if (!from_dir) { -+ strbuf_addf(errbuf, "could not open source directory: '%s'", from_path); ++ strbuf_addf(errbuf, "could not open source directory '%s': %s", ++ from_path, strerror(errno)); + ret = -1; + goto done; + } @@ refs.c: int ref_update_check_old_target(const char *referent, struct ref_update +done: + strbuf_release(&from_buf); + strbuf_release(&to_buf); -+ closedir(from_dir); ++ if (from_dir) ++ closedir(from_dir); + return ret; +} + -+static int count_reflogs(const char *reflog, void *payload) ++static int count_reflogs(const char *reflog UNUSED, void *payload) +{ + size_t *reflog_count = payload; + (*reflog_count)++; @@ refs.c: int ref_update_check_old_target(const char *referent, struct ref_update + struct strbuf buf = STRBUF_INIT; + struct migration_data data; + size_t reflog_count = 0; -+ char *new_gitdir; ++ char *new_gitdir = NULL; ++ int did_migrate_refs = 0; + int ret; + + old_refs = get_main_ref_store(repo); + + /* ++ * We do not have any interfaces that would allow us to write many ++ * reflog entries. Once we have them we can remove this restriction. ++ */ ++ if (refs_for_each_reflog(old_refs, count_reflogs, &reflog_count) < 0) { ++ strbuf_addstr(errbuf, "cannot count reflogs"); ++ ret = -1; ++ goto done; ++ } ++ if (reflog_count) { ++ strbuf_addstr(errbuf, "migrating reflogs is not supported yet"); ++ ret = -1; ++ goto done; ++ } ++ ++ /* ++ * Worktrees complicate the migration because every worktree has a ++ * separate ref storage. While it should be feasible to implement, this ++ * is pushed out to a future iteration. ++ * ++ * TODO: we should really be passing the caller-provided repository to ++ * `has_worktrees()`, but our worktree subsystem doesn't yet support ++ * that. ++ */ ++ if (has_worktrees()) { ++ strbuf_addstr(errbuf, "migrating repositories with worktrees is not supported yet"); ++ ret = -1; ++ goto done; ++ } ++ ++ /* + * The overall logic looks like this: + * + * 1. Set up a new temporary directory and initialize it with the new @@ refs.c: int ref_update_check_old_target(const char *referent, struct ref_update + * 6. Change the repository format to the new ref format. + */ + strbuf_addf(&buf, "%s/%s", old_refs->gitdir, "ref_migration.XXXXXX"); -+ new_gitdir = mkdtemp(buf.buf); ++ new_gitdir = mkdtemp(xstrdup(buf.buf)); + if (!new_gitdir) { + strbuf_addf(errbuf, "cannot create migration directory: %s", + strerror(errno)); @@ refs.c: int ref_update_check_old_target(const char *referent, struct ref_update + goto done; + } + -+ if (refs_for_each_reflog(old_refs, count_reflogs, &reflog_count) < 0) { -+ strbuf_addstr(errbuf, "cannot count reflogs"); -+ ret = -1; -+ goto done; -+ } -+ if (reflog_count) { -+ strbuf_addstr(errbuf, "migrating reflogs is not supported yet"); -+ ret = -1; -+ goto done; -+ } -+ -+ /* -+ * TODO: we should really be passing the caller-provided repository to -+ * `has_worktrees()`, but our worktree subsystem doesn't yet support -+ * that. -+ */ -+ if (has_worktrees()) { -+ strbuf_addstr(errbuf, "migrating repositories with worktrees is not supported yet"); -+ ret = -1; -+ goto done; -+ } -+ + new_refs = ref_store_init(repo, format, new_gitdir, + REF_STORE_ALL_CAPS); + ret = ref_store_create_on_disk(new_refs, 0, errbuf); @@ refs.c: int ref_update_check_old_target(const char *referent, struct ref_update + goto done; + + data.old_refs = old_refs; -+ data.new_refs = new_refs; + data.transaction = transaction; + data.errbuf = errbuf; + @@ refs.c: int ref_update_check_old_target(const char *referent, struct ref_update + ret = ref_transaction_commit(transaction, errbuf); + if (ret < 0) + goto done; ++ did_migrate_refs = 1; + + if (flags & REPO_MIGRATE_REF_STORAGE_FORMAT_DRYRUN) { + printf(_("Finished dry-run migration of refs, " @@ refs.c: int ref_update_check_old_target(const char *referent, struct ref_update + ret = move_files(new_gitdir, old_refs->gitdir, errbuf); + if (ret < 0) + goto done; -+ rmdir(new_gitdir); ++ ++ if (rmdir(new_gitdir) < 0) ++ warning_errno(_("could not remove temporary migration directory '%s'"), ++ new_gitdir); + + /* + * We have migrated the repository, so we now need to adjust the @@ refs.c: int ref_update_check_old_target(const char *referent, struct ref_update + */ + initialize_repository_version(hash_algo_by_ptr(repo->hash_algo), format, 1); + ++ free(new_refs->gitdir); ++ new_refs->gitdir = xstrdup(old_refs->gitdir); + repo->refs_private = new_refs; + ref_store_release(old_refs); + + ret = 0; + +done: ++ if (ret && did_migrate_refs) { ++ strbuf_complete(errbuf, '\n'); ++ strbuf_addf(errbuf, _("migrated refs can be found at '%s'"), ++ new_gitdir); ++ } ++ + if (ret && new_refs) + ref_store_release(new_refs); + ref_transaction_free(transaction); + strbuf_release(&buf); ++ free(new_gitdir); + return ret; +} 9: 0df17a51b4 ! 12: d832414d1f builtin/refs: new command to migrate ref storage formats @@ Documentation/git-refs.txt (new) + Perform the migration, but do not modify the repository. The migrated + refs will be written into a separate directory that can be inspected + separately. The name of the directory will be reported on stdout. This -+ can be used to double check that the migration works as expected doing ++ can be used to double check that the migration works as expected before + performing the actual migration. + +KNOWN LIMITATIONS @@ builtin/refs.c (new) +{ + const char * const migrate_usage[] = { + REFS_MIGRATE_USAGE, -+ NULL ++ NULL, + }; + const char *format_str = NULL; + enum ref_storage_format format; -- 2.45.1.246.gb9cfe4845c.dirty
Attachment:
signature.asc
Description: PGP signature