Patrick Steinhardt <ps@xxxxxx> writes: > On Mon, Dec 09, 2024 at 12:07:20PM +0100, Karthik Nayak wrote: >> The reference transaction only allows a update for a given reference to >> avoid conflicts. This, however, isn't an issue for reflogs. There are no >> conflicts to be resolved in reflogs and when migrating reflogs between >> backends we'd have multiple reflog entries for the same refname. >> >> So allow multiple reflog updates within a single transaction. Also the >> reflog creation logic isn't exposed to the end user. While this might >> change in the future, currently, this reduces the scope of issues to >> think about. >> >> This is required to add reflog migration support to `git refs migrate` >> which currently doesn't support it. > > Nit: the second half of this sentence starting with "which currently..." > feels rather pointless, as it's implicit in the first half. I'd just > drop it. > Will do. >> Signed-off-by: Karthik Nayak <karthik.188@xxxxxxxxx> >> --- >> refs/files-backend.c | 15 +++++++++++---- >> refs/reftable-backend.c | 16 +++++++++++++--- >> 2 files changed, 24 insertions(+), 7 deletions(-) >> >> diff --git a/refs/files-backend.c b/refs/files-backend.c >> index 32975e0fd7a03ab8ddf99c0a68af99921d3f5090..10fba1e97b967fbc04c62a0a6d7d9648ce1c51fb 100644 >> --- a/refs/files-backend.c >> +++ b/refs/files-backend.c >> @@ -2612,6 +2612,9 @@ static int lock_ref_for_update(struct files_ref_store *refs, >> >> update->backend_data = lock; >> >> + if (update->flags & REF_LOG_ONLY) >> + goto out; >> + >> if (update->type & REF_ISSYMREF) { >> if (update->flags & REF_NO_DEREF) { >> /* > > Hm. Does this mean that we don't lock at all for REF_LOG_ONLY updates? > Reflogs themselves have no lockfile, so isn't it mandatory that we lock > the corresponding ref like we used to do? Otherwise I cannot see how we > avoid two concurrent writers to the same reflog. > No it doesn't, this is after the lock is obtained. We simply exit early since for reflog only updates, there is nothing further to do. >> @@ -3036,8 +3042,9 @@ static int files_transaction_finish_initial(struct files_ref_store *refs, >> >> /* Fail if a refname appears more than once in the transaction: */ >> for (i = 0; i < transaction->nr; i++) >> - string_list_append(&affected_refnames, >> - transaction->updates[i]->refname); >> + if (!(transaction->updates[i]->flags & REF_LOG_ONLY)) >> + string_list_append(&affected_refnames, >> + transaction->updates[i]->refname); >> string_list_sort(&affected_refnames); >> if (ref_update_reject_duplicates(&affected_refnames, err)) { >> ret = TRANSACTION_GENERIC_ERROR; > > This on the other hand is sensible -- having multiple REF_LOG_ONLY > transactions queued is fine. > >> diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c >> index b2e3ba877de9e59fea5a4d066eb13e60ef22a32b..d9d2e28122a00ddd7f835c35a5851e390761885b 100644 >> --- a/refs/reftable-backend.c >> +++ b/refs/reftable-backend.c >> @@ -1405,7 +1410,11 @@ static int write_transaction_table(struct reftable_writer *writer, void *cb_data >> } >> >> fill_reftable_log_record(log, &c); >> - log->update_index = ts; >> + >> + update_index = strintmap_get(&logs_ts, u->refname); >> + log->update_index = update_index; >> + strintmap_set(&logs_ts, u->refname, update_index+1); > > So we're now tracking update indices via another map in order to ensure > that the update index will be increased if we have multiple reflog > entries for the same refname. Can we avoid that overhead by instead just > having a global update index counter that increases for every single > reflog entry, regardless of whether we have multiple ones queued up for > the same reference? > > I guess the result would be kind of weird as a single transaction with > multiple ref updates would now always contain N different update > indices. Maybe there's an alternative that allows us to reduce the cost, > like only doing this for REF_LOG_ONLY updates? > > I'm mostly being careful because this here is the hot loop of writing > refs, so I don't want to regress performance. Thanks for bringing this up. I was thinking hard about this for a while. I also did some local benchmarking, for 10000 atomic writes, I couldn't find a note-worthy regression. But I really like the point you made about how this could probably use a counter. I think we can use `u->index`. The index field was added to ensure that the logs stay in order when we sort inside `write_transaction_table()`. But we can use the same here. We can simply do log->update_index = ts + u->index; This would increment the update_index as needed and also only kick in when the `index` itself is set. Which would be only for reflog migration between backends. I think this would work well. Karthik
Attachment:
signature.asc
Description: PGP signature