Re: [PATCH 0/6] refs: introduce support for partial reference transactions

Phillip Wood <phillip.wood123@xxxxxxxxx> · Wed, 19 Feb 2025 14:34:15 +0000

Hi Karthik

On 12/02/2025 12:34, Karthik Nayak wrote:
On 07/02/2025 07:34, Karthik Nayak wrote:
Git's reference updates are traditionally atomic

I'm nitpicking but the updates aren't actually atomic, if a transaction
updates two refs then it is possible for another process to see the one
ref pointing to the new value and the other pointing to the old value.

Good point. This is true in the case of the files backend, since updates
involve locking individual files and during the commit phase, there is a
possibility that one ref is updated while the other is yet to be
(committing of the lock is not global but rather per ref file).

However this is not the case with the reftable backend, there, updates
are written to a new table and committed at the end after locking the
table. So in the reftable backend, this is indeed atomic.

Ah, interesting. That explains why batching updates is so much more 
efficient when using the reftable backend.

This series introduces support for partial reference transactions,
allowing individual reference updates to fail while letting others
proceed.

This sounds like it's abusing ref transactions to implement a
performance optimization.

I understand where you're coming from. This is definitely a stray from
the regular atomic behavior, that transactions promise. But I would say
this is more of an exception handling for the regular transaction
mechanism and AFAIK this is also something that some of the databases
support (see EXCEPTION in PostgreSQL).

Overall, we're adding an exception handling support to the existing
transaction interface.

My understanding of exception handling is that if an error occurs then 
an error handler is called (reading [1] that seems to be what PostgreSQL 
does as well). Is that what is being proposed here? I thought this 
series added a flag to ignore errors rather than provide a way to handle 
them.

[1] 
https://www.postgresql.org/docs/current/plpgsql-control-structures.html#PLPGSQL-ERROR-TRAPPING

I wonder if it would be better to provide that
via a different interface than shares the same underling implementation
as transactions. That would make it clear to someone reading the code
that individual ref updates can fail without affecting the rest. Burying
that detail in a flag makes it rather easy to miss.

Thinking this out, having a different interface sound good, but I feel
we'd end up with the same structure as currently presented in this
series. Only other way is to really split the implementation to support
partial transactions as a entity of its own. In that case, we'd end up
with code duplication.

Do you think you can expand a little more here?

I was thinking of a function that took a list of refs to update and made 
a best effort to update them, ignoring any updates that fail.

My concern with adding a flag to ignore errors in the transaction api is 
that a partial transaction is a contradiction in terms. I'm also 
concerned that it seems to be ignoring all errors. I'd be happier if 
there was someway for the caller to specify which errors to ignore or if 
the caller could provide a callback to handle any errors. That way a 
caller could ignore d/f conflicts but still cause the transaction to 
fail if there was an i/o or could create a reference if it did not exist 
but leave it unchanged if it did exist.

Best Wishes

Phillip