Re: [PATCH 2/6] refs: move duplicate refname update check to generic layer

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Feb 07, 2025 at 08:34:37AM +0100, Karthik Nayak wrote:
> Move the tracking of refnames in `affected_refnames` from individual
> backends into the generic layer in 'refs.c'. This centralizes the
> duplicate refname detection that was previously handled separately by
> each backend.

Exciting, this has been on my TODO list for quite a while already.

> Make some changes to accommodate this move:
> 
>   - Add a `string_list` field `refnames` to `ref_transaction` to contain
>     all the references in a transaction. This field is updated whenever
>     a new update is added.
> 
>   - Modify the backends to use this field internally as needed. The
>     backends need to check if an update for refname already exists when
>     splitting symrefs or adding an update for 'HEAD'.

Okay. Is this actually necessary to be handled by the backends? I
would've expected that it is possible to split up symref updates so that
we insert both symref and target into the list. I wouldn't be surprised
if this wasn't easily possible though -- the logic here is surprisingly
intricate.

>   - In the reftable backend, in `reftable_be_transaction_prepare()`,
>     move the instance of `string_list_has_string()` above
>     `ref_transaction_add_update()` to check before the reference is
>     added.
> 
> This helps reduce duplication of functionality between the backends and
> makes it easier to make changes in a more centralized manner.

> Signed-off-by: Karthik Nayak <karthik.188@xxxxxxxxx>
> ---
>  refs.c                  | 17 ++++++++++++
>  refs/files-backend.c    | 69 ++++++++++---------------------------------------
>  refs/packed-backend.c   | 25 +-----------------
>  refs/refs-internal.h    |  2 ++
>  refs/reftable-backend.c | 53 ++++++++++++-------------------------
>  5 files changed, 50 insertions(+), 116 deletions(-)

Nice.

> diff --git a/refs.c b/refs.c
> index f4094a326a9f88f979654b668cc9c3d27d83cb5d..4c9b706461977995be1d55e7667f7fb708fbbb76 100644
> --- a/refs.c
> +++ b/refs.c
> @@ -1175,6 +1175,7 @@ struct ref_transaction *ref_store_transaction_begin(struct ref_store *refs,
>  	CALLOC_ARRAY(tr, 1);
>  	tr->ref_store = refs;
>  	tr->flags = flags;
> +	string_list_init_dup(&tr->refnames);

Do we actually have to duplicate strings? I would've expected that we
keep strings alive via the `ref_update`s anyway during the transaction's
lifetime.

It might also be interesting to check whether using a strset for this
is more efficient. But that is certainly outside the scope of your patch
series and can be done at a later point. #leftoverbit

> @@ -1245,6 +1248,16 @@ struct ref_update *ref_transaction_add_update(
>  		update->msg = normalize_reflog_message(msg);
>  	}
>  
> +	/*
> +	 * This list is generally used by the backends to avoid duplicates.
> +	 * But we do support multiple log updates for a given refname within
> +	 * a single transaction.
> +	 */
> +	if (!(update->flags & REF_LOG_ONLY)) {
> +		item = string_list_append(&transaction->refnames, refname);
> +		item->util = update;
> +	}
> +
>  	return update;
>  }
> @@ -2397,6 +2410,10 @@ int ref_transaction_prepare(struct ref_transaction *transaction,
>  		return -1;
>  	}
>  
> +	string_list_sort(&transaction->refnames);
> +	if (ref_update_reject_duplicates(&transaction->refnames, err))
> +		return TRANSACTION_GENERIC_ERROR;
> +
>  	ret = refs->be->transaction_prepare(refs, transaction, err);
>  	if (ret)
>  		return ret;

Okay, we keep the list unserted initially, but sort it later before
passing it to the backends so that `string_list_has_string()` works
correctly. Good.

> diff --git a/refs/files-backend.c b/refs/files-backend.c
> index c6a3f6d6261a894e1c294bb1329fdf8079a39eb4..18da30c3f37dc5c09f7d81a9083d6b41d0463bd5 100644
> --- a/refs/files-backend.c
> +++ b/refs/files-backend.c
> @@ -2425,7 +2423,6 @@ static int split_head_update(struct ref_update *update,
>  	 */
>  	if (strcmp(new_update->refname, "HEAD"))
>  		BUG("%s unexpectedly not 'HEAD'", new_update->refname);
> -	string_list_insert(affected_refnames, new_update->refname);
>  
>  	return 0;
>  }

Previously we would've inserted "HEAD" into the list of affected
refnames even if it wasn't directly updated. Why don't we have to do
that now anymore?

> @@ -2441,7 +2438,6 @@ static int split_head_update(struct ref_update *update,
> @@ -2491,15 +2487,6 @@ static int split_symref_update(struct ref_update *update,
>  	update->flags |= REF_LOG_ONLY | REF_NO_DEREF;
>  	update->flags &= ~REF_HAVE_OLD;
>  
> -	/*
> -	 * Add the referent. This insertion is O(N) in the transaction
> -	 * size, but it happens at most once per symref in a
> -	 * transaction. Make sure to add new_update->refname, which will
> -	 * be valid as long as affected_refnames is in use, and NOT
> -	 * referent, which might soon be freed by our caller.
> -	 */
> -	string_list_insert(affected_refnames, new_update->refname);
> -
>  	return 0;
>  }

Same question here, but for symref updates.

> @@ -3030,13 +2995,8 @@ static int files_transaction_finish_initial(struct files_ref_store *refs,
>  	if (transaction->state != REF_TRANSACTION_PREPARED)
>  		BUG("commit called for transaction that is not prepared");
>  
> -	/* Fail if a refname appears more than once in the transaction: */
> -	for (i = 0; i < transaction->nr; i++)
> -		if (!(transaction->updates[i]->flags & REF_LOG_ONLY))
> -			string_list_append(&affected_refnames,
> -					   transaction->updates[i]->refname);
> -	string_list_sort(&affected_refnames);
> -	if (ref_update_reject_duplicates(&affected_refnames, err)) {
> +	string_list_sort(&transaction->refnames);
> +	if (ref_update_reject_duplicates(&transaction->refnames, err)) {
>  		ret = TRANSACTION_GENERIC_ERROR;
>  		goto cleanup;
>  	}

Can't we also make this check generic for initial transactions?

> diff --git a/refs/packed-backend.c b/refs/packed-backend.c
> index a7b6f74b6e35f897f619c540cbc600bbd888bc67..6e7acb077e81435715a1ca3cc928550147c8c56a 100644
> --- a/refs/packed-backend.c
> +++ b/refs/packed-backend.c
> @@ -1653,34 +1648,16 @@ static int packed_transaction_prepare(struct ref_store *ref_store,
>  	 */
>  
>  	CALLOC_ARRAY(data, 1);
> -	string_list_init_nodup(&data->updates);
>  
>  	transaction->backend_data = data;
>  
> -	/*
> -	 * Stick the updates in a string list by refname so that we
> -	 * can sort them:
> -	 */
> -	for (i = 0; i < transaction->nr; i++) {
> -		struct ref_update *update = transaction->updates[i];
> -		struct string_list_item *item =
> -			string_list_append(&data->updates, update->refname);
> -
> -		/* Store a pointer to update in item->util: */
> -		item->util = update;
> -	}
> -	string_list_sort(&data->updates);
> -
> -	if (ref_update_reject_duplicates(&data->updates, err))
> -		goto failure;
> -
>  	if (!is_lock_file_locked(&refs->lock)) {
>  		if (packed_refs_lock(ref_store, 0, err))
>  			goto failure;
>  		data->own_lock = 1;
>  	}
>  
> -	if (write_with_updates(refs, &data->updates, err))
> +	if (write_with_updates(refs, &transaction->refnames, err))
>  		goto failure;
>  
>  	transaction->state = REF_TRANSACTION_PREPARED;

This change is a lot more straight-forward because the packed backend
does not support symrefs at all. Nice.

> diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
> index d39a14c5a469d7d219362e9eae4f578784d65a5b..dd2099d94948a4f23fd9f7ddc06bf3d741229eba 100644
> --- a/refs/reftable-backend.c
> +++ b/refs/reftable-backend.c
> @@ -1202,12 +1184,11 @@ static int reftable_be_transaction_prepare(struct ref_store *ref_store,
>  				goto done;
>  			}
>  
> -			new_update = ref_transaction_add_update(
> -					transaction, "HEAD",
> -					u->flags | REF_LOG_ONLY | REF_NO_DEREF,
> -					&u->new_oid, &u->old_oid, NULL, NULL, NULL,
> -					u->msg);
> -			string_list_insert(&affected_refnames, new_update->refname);
> +			ref_transaction_add_update(
> +				transaction, "HEAD",
> +				u->flags | REF_LOG_ONLY | REF_NO_DEREF,
> +				&u->new_oid, &u->old_oid, NULL, NULL, NULL,
> +				u->msg);
>  		}
>  
>  		ret = reftable_backend_read_ref(be, rewritten_ref,

Equivalent question as for the files backend.

> @@ -1277,6 +1258,15 @@ static int reftable_be_transaction_prepare(struct ref_store *ref_store,
>  				if (!strcmp(rewritten_ref, "HEAD"))
>  					new_flags |= REF_UPDATE_VIA_HEAD;
>  
> +				if (string_list_has_string(&transaction->refnames, referent.buf)) {
> +					strbuf_addf(err,
> +						    _("multiple updates for '%s' (including one "
> +						    "via symref '%s') are not allowed"),
> +						    referent.buf, u->refname);
> +					ret = TRANSACTION_NAME_CONFLICT;
> +					goto done;
> +				}
> +
>  				/*
>  				 * If we are updating a symref (eg. HEAD), we should also
>  				 * update the branch that the symref points to.

This change surprised me a bit. You mention it in the commit message,
but don't state a reason why you do it.

Patrick




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux