Patrick Steinhardt <ps@xxxxxx> writes: > On Fri, Feb 07, 2025 at 08:34:37AM +0100, Karthik Nayak wrote: >> Move the tracking of refnames in `affected_refnames` from individual >> backends into the generic layer in 'refs.c'. This centralizes the >> duplicate refname detection that was previously handled separately by >> each backend. > > Exciting, this has been on my TODO list for quite a while already. > Yeah, I saw that you left a TODO in the reftable backend too. This change was not really needed for partial transactions. But it does make things a bit nicer and easier. >> Make some changes to accommodate this move: >> >> - Add a `string_list` field `refnames` to `ref_transaction` to contain >> all the references in a transaction. This field is updated whenever >> a new update is added. >> >> - Modify the backends to use this field internally as needed. The >> backends need to check if an update for refname already exists when >> splitting symrefs or adding an update for 'HEAD'. > > Okay. Is this actually necessary to be handled by the backends? I > would've expected that it is possible to split up symref updates so that > we insert both symref and target into the list. I wouldn't be surprised > if this wasn't easily possible though -- the logic here is surprisingly > intricate. It is a bit intricate and requires a bit of unwinding to move it to the generic layer. But it is possible, I tried to timebox it for this patch series, but unfortunately it needs a lot more time. So perhaps, something for later. > >> - In the reftable backend, in `reftable_be_transaction_prepare()`, >> move the instance of `string_list_has_string()` above >> `ref_transaction_add_update()` to check before the reference is >> added. >> >> This helps reduce duplication of functionality between the backends and >> makes it easier to make changes in a more centralized manner. > >> Signed-off-by: Karthik Nayak <karthik.188@xxxxxxxxx> >> --- >> refs.c | 17 ++++++++++++ >> refs/files-backend.c | 69 ++++++++++--------------------------------------- >> refs/packed-backend.c | 25 +----------------- >> refs/refs-internal.h | 2 ++ >> refs/reftable-backend.c | 53 ++++++++++++------------------------- >> 5 files changed, 50 insertions(+), 116 deletions(-) > > Nice. > >> diff --git a/refs.c b/refs.c >> index f4094a326a9f88f979654b668cc9c3d27d83cb5d..4c9b706461977995be1d55e7667f7fb708fbbb76 100644 >> --- a/refs.c >> +++ b/refs.c >> @@ -1175,6 +1175,7 @@ struct ref_transaction *ref_store_transaction_begin(struct ref_store *refs, >> CALLOC_ARRAY(tr, 1); >> tr->ref_store = refs; >> tr->flags = flags; >> + string_list_init_dup(&tr->refnames); > > Do we actually have to duplicate strings? I would've expected that we > keep strings alive via the `ref_update`s anyway during the transaction's > lifetime. > True. I was more thinking along the lines of the keeping the memory concerns separate. Also, I sure if there are any scenario's that a `ref_transaction` could outlive a `ref_update`. > It might also be interesting to check whether using a strset for this > is more efficient. But that is certainly outside the scope of your patch > series and can be done at a later point. #leftoverbit > Agreed. >> @@ -1245,6 +1248,16 @@ struct ref_update *ref_transaction_add_update( >> update->msg = normalize_reflog_message(msg); >> } >> >> + /* >> + * This list is generally used by the backends to avoid duplicates. >> + * But we do support multiple log updates for a given refname within >> + * a single transaction. >> + */ >> + if (!(update->flags & REF_LOG_ONLY)) { >> + item = string_list_append(&transaction->refnames, refname); >> + item->util = update; >> + } >> + >> return update; >> } >> @@ -2397,6 +2410,10 @@ int ref_transaction_prepare(struct ref_transaction *transaction, >> return -1; >> } >> >> + string_list_sort(&transaction->refnames); >> + if (ref_update_reject_duplicates(&transaction->refnames, err)) >> + return TRANSACTION_GENERIC_ERROR; >> + >> ret = refs->be->transaction_prepare(refs, transaction, err); >> if (ret) >> return ret; > > Okay, we keep the list unserted initially, but sort it later before > passing it to the backends so that `string_list_has_string()` works > correctly. Good. > >> diff --git a/refs/files-backend.c b/refs/files-backend.c >> index c6a3f6d6261a894e1c294bb1329fdf8079a39eb4..18da30c3f37dc5c09f7d81a9083d6b41d0463bd5 100644 >> --- a/refs/files-backend.c >> +++ b/refs/files-backend.c >> @@ -2425,7 +2423,6 @@ static int split_head_update(struct ref_update *update, >> */ >> if (strcmp(new_update->refname, "HEAD")) >> BUG("%s unexpectedly not 'HEAD'", new_update->refname); >> - string_list_insert(affected_refnames, new_update->refname); >> >> return 0; >> } > > Previously we would've inserted "HEAD" into the list of affected > refnames even if it wasn't directly updated. Why don't we have to do > that now anymore? > We still do, right above this code, there is a call to `ref_transaction_add_update()`. So any ref_update added to the transaction via `ref_transaction_add_update()` will also add the refname to `transaction->refnames`. >> @@ -2441,7 +2438,6 @@ static int split_head_update(struct ref_update *update, >> @@ -2491,15 +2487,6 @@ static int split_symref_update(struct ref_update *update, >> update->flags |= REF_LOG_ONLY | REF_NO_DEREF; >> update->flags &= ~REF_HAVE_OLD; >> >> - /* >> - * Add the referent. This insertion is O(N) in the transaction >> - * size, but it happens at most once per symref in a >> - * transaction. Make sure to add new_update->refname, which will >> - * be valid as long as affected_refnames is in use, and NOT >> - * referent, which might soon be freed by our caller. >> - */ >> - string_list_insert(affected_refnames, new_update->refname); >> - >> return 0; >> } > > Same question here, but for symref updates. > Same as above. In summary, the need for adding new refnames to the list is now centralized and a part of adding a ref update to the transaction. It's a good question, so I'll also add a hint in the commit message. > >> @@ -3030,13 +2995,8 @@ static int files_transaction_finish_initial(struct files_ref_store *refs, >> if (transaction->state != REF_TRANSACTION_PREPARED) >> BUG("commit called for transaction that is not prepared"); >> >> - /* Fail if a refname appears more than once in the transaction: */ >> - for (i = 0; i < transaction->nr; i++) >> - if (!(transaction->updates[i]->flags & REF_LOG_ONLY)) >> - string_list_append(&affected_refnames, >> - transaction->updates[i]->refname); >> - string_list_sort(&affected_refnames); >> - if (ref_update_reject_duplicates(&affected_refnames, err)) { >> + string_list_sort(&transaction->refnames); >> + if (ref_update_reject_duplicates(&transaction->refnames, err)) { >> ret = TRANSACTION_GENERIC_ERROR; >> goto cleanup; >> } > > Can't we also make this check generic for initial transactions? > This one is handled in the next commit, I mostly separated them out because I was not sure why this needs to be here and to draw attention if I'm missing something when removing this. >> diff --git a/refs/packed-backend.c b/refs/packed-backend.c >> index a7b6f74b6e35f897f619c540cbc600bbd888bc67..6e7acb077e81435715a1ca3cc928550147c8c56a 100644 >> --- a/refs/packed-backend.c >> +++ b/refs/packed-backend.c >> @@ -1653,34 +1648,16 @@ static int packed_transaction_prepare(struct ref_store *ref_store, >> */ >> >> CALLOC_ARRAY(data, 1); >> - string_list_init_nodup(&data->updates); >> >> transaction->backend_data = data; >> >> - /* >> - * Stick the updates in a string list by refname so that we >> - * can sort them: >> - */ >> - for (i = 0; i < transaction->nr; i++) { >> - struct ref_update *update = transaction->updates[i]; >> - struct string_list_item *item = >> - string_list_append(&data->updates, update->refname); >> - >> - /* Store a pointer to update in item->util: */ >> - item->util = update; >> - } >> - string_list_sort(&data->updates); >> - >> - if (ref_update_reject_duplicates(&data->updates, err)) >> - goto failure; >> - >> if (!is_lock_file_locked(&refs->lock)) { >> if (packed_refs_lock(ref_store, 0, err)) >> goto failure; >> data->own_lock = 1; >> } >> >> - if (write_with_updates(refs, &data->updates, err)) >> + if (write_with_updates(refs, &transaction->refnames, err)) >> goto failure; >> >> transaction->state = REF_TRANSACTION_PREPARED; > > This change is a lot more straight-forward because the packed backend > does not support symrefs at all. Nice. > Yes indeed. >> diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c >> index d39a14c5a469d7d219362e9eae4f578784d65a5b..dd2099d94948a4f23fd9f7ddc06bf3d741229eba 100644 >> --- a/refs/reftable-backend.c >> +++ b/refs/reftable-backend.c >> @@ -1202,12 +1184,11 @@ static int reftable_be_transaction_prepare(struct ref_store *ref_store, >> goto done; >> } >> >> - new_update = ref_transaction_add_update( >> - transaction, "HEAD", >> - u->flags | REF_LOG_ONLY | REF_NO_DEREF, >> - &u->new_oid, &u->old_oid, NULL, NULL, NULL, >> - u->msg); >> - string_list_insert(&affected_refnames, new_update->refname); >> + ref_transaction_add_update( >> + transaction, "HEAD", >> + u->flags | REF_LOG_ONLY | REF_NO_DEREF, >> + &u->new_oid, &u->old_oid, NULL, NULL, NULL, >> + u->msg); >> } >> >> ret = reftable_backend_read_ref(be, rewritten_ref, > > Equivalent question as for the files backend. > I hope my answer earlier helps, especially since the diff here shows the call to `ref_transaction_add_update()`. >> @@ -1277,6 +1258,15 @@ static int reftable_be_transaction_prepare(struct ref_store *ref_store, >> if (!strcmp(rewritten_ref, "HEAD")) >> new_flags |= REF_UPDATE_VIA_HEAD; >> >> + if (string_list_has_string(&transaction->refnames, referent.buf)) { >> + strbuf_addf(err, >> + _("multiple updates for '%s' (including one " >> + "via symref '%s') are not allowed"), >> + referent.buf, u->refname); >> + ret = TRANSACTION_NAME_CONFLICT; >> + goto done; >> + } >> + >> /* >> * If we are updating a symref (eg. HEAD), we should also >> * update the branch that the symref points to. > > This change surprised me a bit. You mention it in the commit message, > but don't state a reason why you do it. > When a `ref_update` is added to the `transaction` via `ref_transaction_add_update()`, the refname is automatically added to `transaction->refnames`. As a result, checking for the refname in this list will always return true. I'll clarify this in the commit message. > Patrick
Attachment:
signature.asc
Description: PGP signature