Re: [PATCH 6/7] refs: allow multiple reflog entries for the same refname

karthik nayak <karthik.188@xxxxxxxxx> · Thu, 12 Dec 2024 06:47:54 -0800

Patrick Steinhardt <ps@xxxxxx> writes:

> On Mon, Dec 09, 2024 at 12:07:20PM +0100, Karthik Nayak wrote:
>> The reference transaction only allows a update for a given reference to
>> avoid conflicts. This, however, isn't an issue for reflogs. There are no
>> conflicts to be resolved in reflogs and when migrating reflogs between
>> backends we'd have multiple reflog entries for the same refname.
>>
>> So allow multiple reflog updates within a single transaction. Also the
>> reflog creation logic isn't exposed to the end user. While this might
>> change in the future, currently, this reduces the scope of issues to
>> think about.
>>
>> This is required to add reflog migration support to `git refs migrate`
>> which currently doesn't support it.
>
> Nit: the second half of this sentence starting with "which currently..."
> feels rather pointless, as it's implicit in the first half. I'd just
> drop it.
>

Will do.

>> Signed-off-by: Karthik Nayak <karthik.188@xxxxxxxxx>
>> ---
>>  refs/files-backend.c    | 15 +++++++++++----
>>  refs/reftable-backend.c | 16 +++++++++++++---
>>  2 files changed, 24 insertions(+), 7 deletions(-)
>>
>> diff --git a/refs/files-backend.c b/refs/files-backend.c
>> index 32975e0fd7a03ab8ddf99c0a68af99921d3f5090..10fba1e97b967fbc04c62a0a6d7d9648ce1c51fb 100644
>> --- a/refs/files-backend.c
>> +++ b/refs/files-backend.c
>> @@ -2612,6 +2612,9 @@ static int lock_ref_for_update(struct files_ref_store *refs,
>>
>>  	update->backend_data = lock;
>>
>> +	if (update->flags & REF_LOG_ONLY)
>> +		goto out;
>> +
>>  	if (update->type & REF_ISSYMREF) {
>>  		if (update->flags & REF_NO_DEREF) {
>>  			/*
>
> Hm. Does this mean that we don't lock at all for REF_LOG_ONLY updates?
> Reflogs themselves have no lockfile, so isn't it mandatory that we lock
> the corresponding ref like we used to do? Otherwise I cannot see how we
> avoid two concurrent writers to the same reflog.
>

No it doesn't, this is after the lock is obtained. We simply exit early
since for reflog only updates, there is nothing further to do.

>> @@ -3036,8 +3042,9 @@ static int files_transaction_finish_initial(struct files_ref_store *refs,
>>
>>  	/* Fail if a refname appears more than once in the transaction: */
>>  	for (i = 0; i < transaction->nr; i++)
>> -		string_list_append(&affected_refnames,
>> -				   transaction->updates[i]->refname);
>> +		if (!(transaction->updates[i]->flags & REF_LOG_ONLY))
>> +			string_list_append(&affected_refnames,
>> +					   transaction->updates[i]->refname);
>>  	string_list_sort(&affected_refnames);
>>  	if (ref_update_reject_duplicates(&affected_refnames, err)) {
>>  		ret = TRANSACTION_GENERIC_ERROR;
>
> This on the other hand is sensible -- having multiple REF_LOG_ONLY
> transactions queued is fine.
>
>> diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
>> index b2e3ba877de9e59fea5a4d066eb13e60ef22a32b..d9d2e28122a00ddd7f835c35a5851e390761885b 100644
>> --- a/refs/reftable-backend.c
>> +++ b/refs/reftable-backend.c
>> @@ -1405,7 +1410,11 @@ static int write_transaction_table(struct reftable_writer *writer, void *cb_data
>>  				}
>>
>>  				fill_reftable_log_record(log, &c);
>> -				log->update_index = ts;
>> +
>> +				update_index = strintmap_get(&logs_ts, u->refname);
>> +				log->update_index = update_index;
>> +				strintmap_set(&logs_ts, u->refname, update_index+1);
>
> So we're now tracking update indices via another map in order to ensure
> that the update index will be increased if we have multiple reflog
> entries for the same refname. Can we avoid that overhead by instead just
> having a global update index counter that increases for every single
> reflog entry, regardless of whether we have multiple ones queued up for
> the same reference?
>
> I guess the result would be kind of weird as a single transaction with
> multiple ref updates would now always contain N different update
> indices. Maybe there's an alternative that allows us to reduce the cost,
> like only doing this for REF_LOG_ONLY updates?
>
> I'm mostly being careful because this here is the hot loop of writing
> refs, so I don't want to regress performance.

Thanks for bringing this up. I was thinking hard about this for a while.
I also did some local benchmarking, for 10000 atomic writes, I couldn't
find a note-worthy regression.

But I really like the point you made about how this could probably use a
counter. I think we can use `u->index`. The index field was added to
ensure that the logs stay in order when we sort inside
`write_transaction_table()`. But we can use the same here. We can simply
do

    log->update_index = ts + u->index;

This would increment the update_index as needed and also only kick in
when the `index` itself is set. Which would be only for reflog migration
between backends.

I think this would work well.

Karthik
Attachment:
signature.asc

Description: PGP signature