[PATCH v3 00/12] refs: ref storage format migrations

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

this is the third version of my patch series that implements ref storage
format migrations.

Changes compared to v2:

  - Perform sanity checks for worktrees and reflogs before we create the
    temporary refdb directory.

  - Swapped out calls to `remove_path()` to `unlink()`. We do not want
    to walk up and remove empty parent directories, even though this is
    harmless in practice.

  - Release the reftable refdb before removing it. This closes the
    cached "tables.list" file descriptor, which would otherwise break
    removal of this file on Windows.

  - Fix a bug with worktrees where we store the current worktree refdb
    twice. This caused us to keep file descriptors open, which breaks
    removal of the refdb on Windows.

  - Simplify freeing reftable's merged tables. This isn't really needed
    by this series, but I stumbled over this while investigating why
    things break on Windows.

  - Improve error messages to add `strerror(errno)`, which helped me in
    debugging those errors.

  - Print path to the migrated refs if things fail after we have
    populated them such that users can recover.

  - Fix segfault when releasing a partially initialized "files" ref
    store.

  - Some smallish improvements littered across the patches.

Thanks!

Patrick

Patrick Steinhardt (12):
  setup: unset ref storage when reinitializing repository version
  refs: convert ref storage format to an enum
  refs: pass storage format to `ref_store_init()` explicitly
  refs: allow to skip creation of reflog entries
  refs/files: refactor `add_pseudoref_and_head_entries()`
  refs/files: extract function to iterate through root refs
  refs/files: fix NULL pointer deref when releasing ref store
  reftable: inline `merged_table_release()`
  worktree: don't store main worktree twice
  refs: implement removal of ref storages
  refs: implement logic to migrate between ref storage formats
  builtin/refs: new command to migrate ref storage formats

 .gitignore                 |   1 +
 Documentation/git-refs.txt |  62 +++++++
 Makefile                   |   1 +
 builtin.h                  |   1 +
 builtin/clone.c            |   2 +-
 builtin/init-db.c          |   2 +-
 builtin/refs.c             |  75 ++++++++
 git.c                      |   1 +
 refs.c                     | 340 +++++++++++++++++++++++++++++++++++--
 refs.h                     |  41 ++++-
 refs/files-backend.c       | 123 ++++++++++++--
 refs/packed-backend.c      |  15 ++
 refs/ref-cache.c           |   2 +
 refs/refs-internal.h       |   7 +
 refs/reftable-backend.c    |  55 +++++-
 reftable/merged.c          |  12 +-
 reftable/merged.h          |   2 -
 reftable/stack.c           |   8 +-
 repository.c               |   3 +-
 repository.h               |  10 +-
 setup.c                    |  10 +-
 setup.h                    |   9 +-
 t/helper/test-ref-store.c  |   1 +
 t/t1460-refs-migrate.sh    | 243 ++++++++++++++++++++++++++
 worktree.c                 |  29 ++--
 25 files changed, 974 insertions(+), 81 deletions(-)
 create mode 100644 Documentation/git-refs.txt
 create mode 100644 builtin/refs.c
 create mode 100755 t/t1460-refs-migrate.sh

Range-diff against v2:
 1:  8b11127daf !  1:  afb705f6a0 setup: unset ref storage when reinitializing repository version
    @@ Commit message
         storages though, so this is about to become an issue there.
     
         Prepare for this and unset the ref storage format when reinitializing a
    -    repoistory with the "files" format.
    +    repository with the "files" format.
     
         Signed-off-by: Patrick Steinhardt <ps@xxxxxx>
     
 2:  25f740f395 =  2:  7989e82dcd refs: convert ref storage format to an enum
 3:  6e7b9764f6 =  3:  7d1a86292c refs: pass storage format to `ref_store_init()` explicitly
 4:  03f4ac6ee7 =  4:  d0539b7456 refs: allow to skip creation of reflog entries
 5:  71f31fe66c =  5:  7f9ce5af2e refs/files: refactor `add_pseudoref_and_head_entries()`
 6:  6b696690ca =  6:  f7577a0ab3 refs/files: extract function to iterate through root refs
 -:  ---------- >  7:  56baa798fb refs/files: fix NULL pointer deref when releasing ref store
 -:  ---------- >  8:  c7e8ab40b5 reftable: inline `merged_table_release()`
 -:  ---------- >  9:  7a89aae515 worktree: don't store main worktree twice
 7:  b758c419c6 ! 10:  f9d9420cf9 refs: implement removal of ref storages
    @@ refs/files-backend.c: static int files_ref_store_create_on_disk(struct ref_store
     +
     +	strbuf_addf(&buf, "%s/%s", data->gitdir, refname);
     +
    -+	ret = remove_path(buf.buf);
    ++	ret = unlink(buf.buf);
     +	if (ret < 0)
     +		strbuf_addf(data->err, "could not delete %s: %s\n",
     +			    refname, strerror(errno));
    @@ refs/files-backend.c: static int files_ref_store_create_on_disk(struct ref_store
     +
     +	strbuf_addf(&sb, "%s/refs", refs->base.gitdir);
     +	if (remove_dir_recursively(&sb, 0) < 0) {
    -+		strbuf_addstr(err, "could not delete refs");
    ++		strbuf_addf(err, "could not delete refs: %s",
    ++			    strerror(errno));
     +		ret = -1;
     +	}
     +	strbuf_reset(&sb);
     +
     +	strbuf_addf(&sb, "%s/logs", refs->base.gitdir);
     +	if (remove_dir_recursively(&sb, 0) < 0) {
    -+		strbuf_addstr(err, "could not delete logs\n");
    ++		strbuf_addf(err, "could not delete logs: %s",
    ++			    strerror(errno));
     +		ret = -1;
     +	}
     +	strbuf_reset(&sb);
    @@ refs/reftable-backend.c: static int reftable_be_create_on_disk(struct ref_store
     +	struct strbuf sb = STRBUF_INIT;
     +	int ret = 0;
     +
    ++	/*
    ++	 * Release the ref store such that all stacks are closed. This is
    ++	 * required so that the "tables.list" file is not open anymore, which
    ++	 * would otherwise make it impossible to remove the file on Windows.
    ++	 */
    ++	reftable_be_release(ref_store);
    ++
     +	strbuf_addf(&sb, "%s/reftable", refs->base.gitdir);
     +	if (remove_dir_recursively(&sb, 0) < 0) {
    -+		strbuf_addstr(err, "could not delete reftables");
    ++		strbuf_addf(err, "could not delete reftables: %s",
    ++			    strerror(errno));
     +		ret = -1;
     +	}
     +	strbuf_reset(&sb);
     +
     +	strbuf_addf(&sb, "%s/HEAD", refs->base.gitdir);
    -+	if (remove_path(sb.buf) < 0) {
    -+		strbuf_addstr(err, "could not delete stub HEAD");
    ++	if (unlink(sb.buf) < 0) {
    ++		strbuf_addf(err, "could not delete stub HEAD: %s",
    ++			    strerror(errno));
     +		ret = -1;
     +	}
     +	strbuf_reset(&sb);
     +
     +	strbuf_addf(&sb, "%s/refs/heads", refs->base.gitdir);
    -+	if (remove_path(sb.buf) < 0) {
    -+		strbuf_addstr(err, "could not delete stub heads");
    ++	if (unlink(sb.buf) < 0) {
    ++		strbuf_addf(err, "could not delete stub heads: %s",
    ++			    strerror(errno));
    ++		ret = -1;
    ++	}
    ++	strbuf_reset(&sb);
    ++
    ++	strbuf_addf(&sb, "%s/refs", refs->base.gitdir);
    ++	if (rmdir(sb.buf) < 0) {
    ++		strbuf_addf(err, "could not delete stub heads: %s",
    ++			    strerror(errno));
     +		ret = -1;
     +	}
     +
 8:  4d3eb5ea89 ! 11:  1f26051eff refs: implement logic to migrate between ref storage formats
    @@ Commit message
             many reflog entries.
     
           - We do not lock the repository for concurrent access, and thus
    -        concurrent writes may make use end up with weird in-between states.
    -        There is no way to fully lock the "files" backend for writes due to
    -        its format, and thus we punt on this topic altogether and defer to
    -        the user to avoid those from happening.
    +        concurrent writes may end up with weird in-between states. There is
    +        no way to fully lock the "files" backend for writes due to its
    +        format, and thus we punt on this topic altogether and defer to the
    +        user to avoid those from happening.
     
         In other words, this version is a minimum viable product for migrating a
         repository's ref storage format. It works alright for bare repos, which
    @@ refs.c: int ref_update_check_old_target(const char *referent, struct ref_update
     +
     +struct migration_data {
     +	struct ref_store *old_refs;
    -+	struct ref_store *new_refs;
     +	struct ref_transaction *transaction;
     +	struct strbuf *errbuf;
    -+	const char *refname;
     +};
     +
     +static int migrate_one_ref(const char *refname, const struct object_id *oid,
    @@ refs.c: int ref_update_check_old_target(const char *referent, struct ref_update
     +
     +	from_dir = opendir(from_path);
     +	if (!from_dir) {
    -+		strbuf_addf(errbuf, "could not open source directory: '%s'", from_path);
    ++		strbuf_addf(errbuf, "could not open source directory '%s': %s",
    ++			    from_path, strerror(errno));
     +		ret = -1;
     +		goto done;
     +	}
    @@ refs.c: int ref_update_check_old_target(const char *referent, struct ref_update
     +done:
     +	strbuf_release(&from_buf);
     +	strbuf_release(&to_buf);
    -+	closedir(from_dir);
    ++	if (from_dir)
    ++		closedir(from_dir);
     +	return ret;
     +}
     +
    -+static int count_reflogs(const char *reflog, void *payload)
    ++static int count_reflogs(const char *reflog UNUSED, void *payload)
     +{
     +	size_t *reflog_count = payload;
     +	(*reflog_count)++;
    @@ refs.c: int ref_update_check_old_target(const char *referent, struct ref_update
     +	struct strbuf buf = STRBUF_INIT;
     +	struct migration_data data;
     +	size_t reflog_count = 0;
    -+	char *new_gitdir;
    ++	char *new_gitdir = NULL;
    ++	int did_migrate_refs = 0;
     +	int ret;
     +
     +	old_refs = get_main_ref_store(repo);
     +
     +	/*
    ++	 * We do not have any interfaces that would allow us to write many
    ++	 * reflog entries. Once we have them we can remove this restriction.
    ++	 */
    ++	if (refs_for_each_reflog(old_refs, count_reflogs, &reflog_count) < 0) {
    ++		strbuf_addstr(errbuf, "cannot count reflogs");
    ++		ret = -1;
    ++		goto done;
    ++	}
    ++	if (reflog_count) {
    ++		strbuf_addstr(errbuf, "migrating reflogs is not supported yet");
    ++		ret = -1;
    ++		goto done;
    ++	}
    ++
    ++	/*
    ++	 * Worktrees complicate the migration because every worktree has a
    ++	 * separate ref storage. While it should be feasible to implement, this
    ++	 * is pushed out to a future iteration.
    ++	 *
    ++	 * TODO: we should really be passing the caller-provided repository to
    ++	 * `has_worktrees()`, but our worktree subsystem doesn't yet support
    ++	 * that.
    ++	 */
    ++	if (has_worktrees()) {
    ++		strbuf_addstr(errbuf, "migrating repositories with worktrees is not supported yet");
    ++		ret = -1;
    ++		goto done;
    ++	}
    ++
    ++	/*
     +	 * The overall logic looks like this:
     +	 *
     +	 *   1. Set up a new temporary directory and initialize it with the new
    @@ refs.c: int ref_update_check_old_target(const char *referent, struct ref_update
     +	 *   6. Change the repository format to the new ref format.
     +	 */
     +	strbuf_addf(&buf, "%s/%s", old_refs->gitdir, "ref_migration.XXXXXX");
    -+	new_gitdir = mkdtemp(buf.buf);
    ++	new_gitdir = mkdtemp(xstrdup(buf.buf));
     +	if (!new_gitdir) {
     +		strbuf_addf(errbuf, "cannot create migration directory: %s",
     +			    strerror(errno));
    @@ refs.c: int ref_update_check_old_target(const char *referent, struct ref_update
     +		goto done;
     +	}
     +
    -+	if (refs_for_each_reflog(old_refs, count_reflogs, &reflog_count) < 0) {
    -+		strbuf_addstr(errbuf, "cannot count reflogs");
    -+		ret = -1;
    -+		goto done;
    -+	}
    -+	if (reflog_count) {
    -+		strbuf_addstr(errbuf, "migrating reflogs is not supported yet");
    -+		ret = -1;
    -+		goto done;
    -+	}
    -+
    -+	/*
    -+	 * TODO: we should really be passing the caller-provided repository to
    -+	 * `has_worktrees()`, but our worktree subsystem doesn't yet support
    -+	 * that.
    -+	 */
    -+	if (has_worktrees()) {
    -+		strbuf_addstr(errbuf, "migrating repositories with worktrees is not supported yet");
    -+		ret = -1;
    -+		goto done;
    -+	}
    -+
     +	new_refs = ref_store_init(repo, format, new_gitdir,
     +				  REF_STORE_ALL_CAPS);
     +	ret = ref_store_create_on_disk(new_refs, 0, errbuf);
    @@ refs.c: int ref_update_check_old_target(const char *referent, struct ref_update
     +		goto done;
     +
     +	data.old_refs = old_refs;
    -+	data.new_refs = new_refs;
     +	data.transaction = transaction;
     +	data.errbuf = errbuf;
     +
    @@ refs.c: int ref_update_check_old_target(const char *referent, struct ref_update
     +	ret = ref_transaction_commit(transaction, errbuf);
     +	if (ret < 0)
     +		goto done;
    ++	did_migrate_refs = 1;
     +
     +	if (flags & REPO_MIGRATE_REF_STORAGE_FORMAT_DRYRUN) {
     +		printf(_("Finished dry-run migration of refs, "
    @@ refs.c: int ref_update_check_old_target(const char *referent, struct ref_update
     +	ret = move_files(new_gitdir, old_refs->gitdir, errbuf);
     +	if (ret < 0)
     +		goto done;
    -+	rmdir(new_gitdir);
    ++
    ++	if (rmdir(new_gitdir) < 0)
    ++		warning_errno(_("could not remove temporary migration directory '%s'"),
    ++			      new_gitdir);
     +
     +	/*
     +	 * We have migrated the repository, so we now need to adjust the
    @@ refs.c: int ref_update_check_old_target(const char *referent, struct ref_update
     +	 */
     +	initialize_repository_version(hash_algo_by_ptr(repo->hash_algo), format, 1);
     +
    ++	free(new_refs->gitdir);
    ++	new_refs->gitdir = xstrdup(old_refs->gitdir);
     +	repo->refs_private = new_refs;
     +	ref_store_release(old_refs);
     +
     +	ret = 0;
     +
     +done:
    ++	if (ret && did_migrate_refs) {
    ++		strbuf_complete(errbuf, '\n');
    ++		strbuf_addf(errbuf, _("migrated refs can be found at '%s'"),
    ++			    new_gitdir);
    ++	}
    ++
     +	if (ret && new_refs)
     +		ref_store_release(new_refs);
     +	ref_transaction_free(transaction);
     +	strbuf_release(&buf);
    ++	free(new_gitdir);
     +	return ret;
     +}
     
 9:  0df17a51b4 ! 12:  d832414d1f builtin/refs: new command to migrate ref storage formats
    @@ Documentation/git-refs.txt (new)
     +	Perform the migration, but do not modify the repository. The migrated
     +	refs will be written into a separate directory that can be inspected
     +	separately. The name of the directory will be reported on stdout. This
    -+	can be used to double check that the migration works as expected doing
    ++	can be used to double check that the migration works as expected before
     +	performing the actual migration.
     +
     +KNOWN LIMITATIONS
    @@ builtin/refs.c (new)
     +{
     +	const char * const migrate_usage[] = {
     +		REFS_MIGRATE_USAGE,
    -+		NULL
    ++		NULL,
     +	};
     +	const char *format_str = NULL;
     +	enum ref_storage_format format;
-- 
2.45.1.246.gb9cfe4845c.dirty

Attachment: signature.asc
Description: PGP signature


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux