On Fri, Feb 07, 2025 at 08:34:37AM +0100, Karthik Nayak wrote: > Move the tracking of refnames in `affected_refnames` from individual > backends into the generic layer in 'refs.c'. This centralizes the > duplicate refname detection that was previously handled separately by > each backend. Exciting, this has been on my TODO list for quite a while already. > Make some changes to accommodate this move: > > - Add a `string_list` field `refnames` to `ref_transaction` to contain > all the references in a transaction. This field is updated whenever > a new update is added. > > - Modify the backends to use this field internally as needed. The > backends need to check if an update for refname already exists when > splitting symrefs or adding an update for 'HEAD'. Okay. Is this actually necessary to be handled by the backends? I would've expected that it is possible to split up symref updates so that we insert both symref and target into the list. I wouldn't be surprised if this wasn't easily possible though -- the logic here is surprisingly intricate. > - In the reftable backend, in `reftable_be_transaction_prepare()`, > move the instance of `string_list_has_string()` above > `ref_transaction_add_update()` to check before the reference is > added. > > This helps reduce duplication of functionality between the backends and > makes it easier to make changes in a more centralized manner. > Signed-off-by: Karthik Nayak <karthik.188@xxxxxxxxx> > --- > refs.c | 17 ++++++++++++ > refs/files-backend.c | 69 ++++++++++--------------------------------------- > refs/packed-backend.c | 25 +----------------- > refs/refs-internal.h | 2 ++ > refs/reftable-backend.c | 53 ++++++++++++------------------------- > 5 files changed, 50 insertions(+), 116 deletions(-) Nice. > diff --git a/refs.c b/refs.c > index f4094a326a9f88f979654b668cc9c3d27d83cb5d..4c9b706461977995be1d55e7667f7fb708fbbb76 100644 > --- a/refs.c > +++ b/refs.c > @@ -1175,6 +1175,7 @@ struct ref_transaction *ref_store_transaction_begin(struct ref_store *refs, > CALLOC_ARRAY(tr, 1); > tr->ref_store = refs; > tr->flags = flags; > + string_list_init_dup(&tr->refnames); Do we actually have to duplicate strings? I would've expected that we keep strings alive via the `ref_update`s anyway during the transaction's lifetime. It might also be interesting to check whether using a strset for this is more efficient. But that is certainly outside the scope of your patch series and can be done at a later point. #leftoverbit > @@ -1245,6 +1248,16 @@ struct ref_update *ref_transaction_add_update( > update->msg = normalize_reflog_message(msg); > } > > + /* > + * This list is generally used by the backends to avoid duplicates. > + * But we do support multiple log updates for a given refname within > + * a single transaction. > + */ > + if (!(update->flags & REF_LOG_ONLY)) { > + item = string_list_append(&transaction->refnames, refname); > + item->util = update; > + } > + > return update; > } > @@ -2397,6 +2410,10 @@ int ref_transaction_prepare(struct ref_transaction *transaction, > return -1; > } > > + string_list_sort(&transaction->refnames); > + if (ref_update_reject_duplicates(&transaction->refnames, err)) > + return TRANSACTION_GENERIC_ERROR; > + > ret = refs->be->transaction_prepare(refs, transaction, err); > if (ret) > return ret; Okay, we keep the list unserted initially, but sort it later before passing it to the backends so that `string_list_has_string()` works correctly. Good. > diff --git a/refs/files-backend.c b/refs/files-backend.c > index c6a3f6d6261a894e1c294bb1329fdf8079a39eb4..18da30c3f37dc5c09f7d81a9083d6b41d0463bd5 100644 > --- a/refs/files-backend.c > +++ b/refs/files-backend.c > @@ -2425,7 +2423,6 @@ static int split_head_update(struct ref_update *update, > */ > if (strcmp(new_update->refname, "HEAD")) > BUG("%s unexpectedly not 'HEAD'", new_update->refname); > - string_list_insert(affected_refnames, new_update->refname); > > return 0; > } Previously we would've inserted "HEAD" into the list of affected refnames even if it wasn't directly updated. Why don't we have to do that now anymore? > @@ -2441,7 +2438,6 @@ static int split_head_update(struct ref_update *update, > @@ -2491,15 +2487,6 @@ static int split_symref_update(struct ref_update *update, > update->flags |= REF_LOG_ONLY | REF_NO_DEREF; > update->flags &= ~REF_HAVE_OLD; > > - /* > - * Add the referent. This insertion is O(N) in the transaction > - * size, but it happens at most once per symref in a > - * transaction. Make sure to add new_update->refname, which will > - * be valid as long as affected_refnames is in use, and NOT > - * referent, which might soon be freed by our caller. > - */ > - string_list_insert(affected_refnames, new_update->refname); > - > return 0; > } Same question here, but for symref updates. > @@ -3030,13 +2995,8 @@ static int files_transaction_finish_initial(struct files_ref_store *refs, > if (transaction->state != REF_TRANSACTION_PREPARED) > BUG("commit called for transaction that is not prepared"); > > - /* Fail if a refname appears more than once in the transaction: */ > - for (i = 0; i < transaction->nr; i++) > - if (!(transaction->updates[i]->flags & REF_LOG_ONLY)) > - string_list_append(&affected_refnames, > - transaction->updates[i]->refname); > - string_list_sort(&affected_refnames); > - if (ref_update_reject_duplicates(&affected_refnames, err)) { > + string_list_sort(&transaction->refnames); > + if (ref_update_reject_duplicates(&transaction->refnames, err)) { > ret = TRANSACTION_GENERIC_ERROR; > goto cleanup; > } Can't we also make this check generic for initial transactions? > diff --git a/refs/packed-backend.c b/refs/packed-backend.c > index a7b6f74b6e35f897f619c540cbc600bbd888bc67..6e7acb077e81435715a1ca3cc928550147c8c56a 100644 > --- a/refs/packed-backend.c > +++ b/refs/packed-backend.c > @@ -1653,34 +1648,16 @@ static int packed_transaction_prepare(struct ref_store *ref_store, > */ > > CALLOC_ARRAY(data, 1); > - string_list_init_nodup(&data->updates); > > transaction->backend_data = data; > > - /* > - * Stick the updates in a string list by refname so that we > - * can sort them: > - */ > - for (i = 0; i < transaction->nr; i++) { > - struct ref_update *update = transaction->updates[i]; > - struct string_list_item *item = > - string_list_append(&data->updates, update->refname); > - > - /* Store a pointer to update in item->util: */ > - item->util = update; > - } > - string_list_sort(&data->updates); > - > - if (ref_update_reject_duplicates(&data->updates, err)) > - goto failure; > - > if (!is_lock_file_locked(&refs->lock)) { > if (packed_refs_lock(ref_store, 0, err)) > goto failure; > data->own_lock = 1; > } > > - if (write_with_updates(refs, &data->updates, err)) > + if (write_with_updates(refs, &transaction->refnames, err)) > goto failure; > > transaction->state = REF_TRANSACTION_PREPARED; This change is a lot more straight-forward because the packed backend does not support symrefs at all. Nice. > diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c > index d39a14c5a469d7d219362e9eae4f578784d65a5b..dd2099d94948a4f23fd9f7ddc06bf3d741229eba 100644 > --- a/refs/reftable-backend.c > +++ b/refs/reftable-backend.c > @@ -1202,12 +1184,11 @@ static int reftable_be_transaction_prepare(struct ref_store *ref_store, > goto done; > } > > - new_update = ref_transaction_add_update( > - transaction, "HEAD", > - u->flags | REF_LOG_ONLY | REF_NO_DEREF, > - &u->new_oid, &u->old_oid, NULL, NULL, NULL, > - u->msg); > - string_list_insert(&affected_refnames, new_update->refname); > + ref_transaction_add_update( > + transaction, "HEAD", > + u->flags | REF_LOG_ONLY | REF_NO_DEREF, > + &u->new_oid, &u->old_oid, NULL, NULL, NULL, > + u->msg); > } > > ret = reftable_backend_read_ref(be, rewritten_ref, Equivalent question as for the files backend. > @@ -1277,6 +1258,15 @@ static int reftable_be_transaction_prepare(struct ref_store *ref_store, > if (!strcmp(rewritten_ref, "HEAD")) > new_flags |= REF_UPDATE_VIA_HEAD; > > + if (string_list_has_string(&transaction->refnames, referent.buf)) { > + strbuf_addf(err, > + _("multiple updates for '%s' (including one " > + "via symref '%s') are not allowed"), > + referent.buf, u->refname); > + ret = TRANSACTION_NAME_CONFLICT; > + goto done; > + } > + > /* > * If we are updating a symref (eg. HEAD), we should also > * update the branch that the symref points to. This change surprised me a bit. You mention it in the commit message, but don't state a reason why you do it. Patrick