Re: [PATCH v3 00/14] ref-transactions-reflog

Michael Haggerty <mhagger@xxxxxxxxxxxx> · Thu, 20 Nov 2014 11:56:41 +0100

On 11/20/2014 12:22 AM, Stefan Beller wrote:
> Sorry for the long delay.
> Thanks for the explanation and discussion.
> 
> So do I understand it right, that you are not opposing
> the introduction of "everything should go through transactions"
> but rather the detail and abstraction level of the API?

Correct.

> So starting from Michaels proposal in the first response:
> 
> 1. Add a reflog entry when a reference is updated in a transaction.
> 
> ok
> 
> 2. Rename a reflog file when the corresponding reference is renamed.
> 
> This should happen within the same transaction as the reference is
> renamed, right?

Yes. Maybe there should be a "rename reference" operation that can be
added to a transaction, and it simply knows to rename any associated
reflogs. Then the calling code wouldn't have to worry about reflogs
explicitly in this case at all.

> So we don't have a multistep process here, which may abort in between having the
> reference updated and a broken reflog or vice versa. We want to either
> have both
> the ref and the reflog updated or neither.

Yes.

> 3. Delete the reflog when the corresponding reference is deleted [1].
> 
> also as one transaction?

It would be a side-effect of committing a transaction that contains a
reference deletion. The deletion of the reflog would be done at the same
time that the rest of the transaction is committed, and again the
calling code wouldn't have to explicitly worry about the reflogs.

> 4. Configure a reference to be reflogged.
> 5. Configure a reference to not be reflogged anymore and delete any
>    existing reflog.
> 
> Why do we need 4 and 5 here? Wouldn't all refs be reflog by default and
> why do I want to exclude some?
> 
> 6. Selectively expire old reflog entries, e.g., based on their age.
> 
> This is the maintenance operation, which you were talking about.
> In my vision, this also should go into one transaction. So you have the
> business logic figuring out all the changes ("drop reflog entry a b and d")
> and within one transaction we can perform all of the changes.

But if we take the approach described above, AFAICT this operation is
the only one that would require the caller to manipulate reflog entries
explicitly. And it has to iterate through the old reflog entries, decide
which ones to keep, possibly change its neighbors to eliminate gaps in
the chain, then stuff each of the reflog entries into a transaction one
by one. To allow this to be implemented on the caller side, the
transaction API has to be complicated in the following ways:

* Add a transaction_update_type (UPDATE_SHA1 vs. UPDATE_LOG).
* Add reflog_fd, reflog_lock, and committer members to struct ref_update.
* New function transaction_update_reflog().
* A new flag REFLOG_TRUNCATE that allows the reflog file to be truncated
before writing.
* Machinery that recognizes that a transaction contains multiple reflog
updates for the same reference and processes them specially to avoid
locking and rewriting the reflog file multiple times.

So this design has the caller serializing all reflog entries into
separate ref_update structs (which implies that they are held in RAM!)
only for ref_transaction_commit() to scan through all ref_updates
looking for reflog updates that go together so that they can be
processed as a whole. In other words, the caller picks the reflog apart
and then ref_transaction_commit() glues it back together. It's all very
contrived.

I suggest that the caller only be responsible for deciding which reflog
entries to keep (by supplying a callback function), and a new
expire_reflogs_for_me_please() API function be responsible for taking
out a lock, feeding the old reflog entries to the callback, expiring the
unwanted entries, optionally eliminating gaps in the chain (for the use
of "reflog [expire|delete] --rewrite"), writing the new reflog entries,
and optionally updating the reference itself (for the use of "reflog
[expire|delete] --updateref").

The benefit will be simpler code, a better separation of
responsibilities, and a simpler VTABLE that future reference backends
have to implement.

I would love to work on this but unfortunately have way too much on my
plate right now.

Michael

-- 
Michael Haggerty
mhagger@xxxxxxxxxxxx

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html