Re: [PATCH v2 8/9] refs: implement logic to migrate between ref storage formats

Justin Tobler <jltobler@xxxxxxxxx> · Fri, 24 May 2024 17:32:20 -0500

On 24/05/24 12:15PM, Patrick Steinhardt wrote:
> With the introduction of the new "reftable" backend, users may want to
> migrate repositories between the backends without having to recreate the
> whole repository. Add the logic to do so.
> 
> The implementation is generic and works with arbitrary ref storage
> formats so that a backend does not need to implement any migration
> logic. It does have a few limitations though:
> 
>   - We do not migrate repositories with worktrees, because worktrees
>     have separate ref storages. It makes the overall affair more complex
>     if we have to migrate multiple storages at once.
> 
>   - We do not migrate reflogs, because we have no interfaces to write
>     many reflog entries.
> 
>   - We do not lock the repository for concurrent access, and thus
>     concurrent writes may make use end up with weird in-between states.

Let's drop the "make use" in this line.

>     There is no way to fully lock the "files" backend for writes due to
>     its format, and thus we punt on this topic altogether and defer to
>     the user to avoid those from happening.
> 
> In other words, this version is a minimum viable product for migrating a
> repository's ref storage format. It works alright for bare repos, which
> often have neither worktrees nor reflogs. But it will not work for many
> other repositories without some preparations. These limitations are not
> set into stone though, and ideally we will eventually address them over
> time.
> 
> The logic is not yet used by anything, and thus there are no tests for
> it. Those will be added in the next commit.
[snip]
> +int repo_migrate_ref_storage_format(struct repository *repo,
> +				    enum ref_storage_format format,
> +				    unsigned int flags,
> +				    struct strbuf *errbuf)
> +{
> +	struct ref_store *old_refs = NULL, *new_refs = NULL;
> +	struct ref_transaction *transaction = NULL;
> +	struct strbuf buf = STRBUF_INIT;
> +	struct migration_data data;
> +	size_t reflog_count = 0;
> +	char *new_gitdir;
> +	int ret;
> +
> +	old_refs = get_main_ref_store(repo);
> +
> +	/*
> +	 * The overall logic looks like this:
> +	 *
> +	 *   1. Set up a new temporary directory and initialize it with the new
> +	 *      format. This is where all refs will be migrated into.
> +	 *
> +	 *   2. Enumerate all refs and write them into the new ref storage.
> +	 *      This operation is safe as we do not yet modify the main
> +	 *      repository.
> +	 *
> +	 *   3. If we're in dry-run mode then we are done and can hand over the
> +	 *      directory to the caller for inspection. If not, we now start
> +	 *      with the destructive part.
> +	 *
> +	 *   4. Delete the old ref storage from disk. As we have a copy of refs
> +	 *      in the new ref storage it's okay(ish) if we now get interrupted
> +	 *      as there is an equivalent copy of all refs available.
> +	 *
> +	 *   5. Move the new ref storage files into place.
> +	 *
> +	 *   6. Change the repository format to the new ref format.
> +	 */
> +	strbuf_addf(&buf, "%s/%s", old_refs->gitdir, "ref_migration.XXXXXX");
> +	new_gitdir = mkdtemp(buf.buf);
> +	if (!new_gitdir) {
> +		strbuf_addf(errbuf, "cannot create migration directory: %s",
> +			    strerror(errno));
> +		ret = -1;
> +		goto done;
> +	}

If the repository contains reflogs or has worktrees the migration does
not proceed. This means that the created tempdir gets left behind with
no indication and would be left to the user clean it up.

Instead, we could move tempdir creation to after these checks so it is
not needlessly created.

> +
> +	if (refs_for_each_reflog(old_refs, count_reflogs, &reflog_count) < 0) {
> +		strbuf_addstr(errbuf, "cannot count reflogs");
> +		ret = -1;
> +		goto done;
> +	}
> +	if (reflog_count) {
> +		strbuf_addstr(errbuf, "migrating reflogs is not supported yet");
> +		ret = -1;
> +		goto done;
> +	}
> +
> +	/*
> +	 * TODO: we should really be passing the caller-provided repository to
> +	 * `has_worktrees()`, but our worktree subsystem doesn't yet support
> +	 * that.
> +	 */
> +	if (has_worktrees()) {
> +		strbuf_addstr(errbuf, "migrating repositories with worktrees is not supported yet");
> +		ret = -1;
> +		goto done;
> +	}
[snip]
> +	/*
> +	 * Until now we were in the non-destructive phase, where we only
> +	 * populated the new ref store. From hereon though we are about
> +	 * to get hands by deleting the old ref store and then moving
> +	 * the new one into place.
> +	 *
> +	 * Assuming that there were no concurrent writes, the new ref
> +	 * store should have all information. So if we fail from hereon
> +	 * we may be in an in-between state, but it would still be able
> +	 * to recover by manually moving remaining files from the
> +	 * temporary migration directory into place.
> +	 */

If there a failure after this point, should we provide a hint to user
that the refernces exist in the tempdir?

-Justin