Re: [PATCH 2/2] files-backend: don't rewrite the `packed-refs` file unnecessarily

Michael Haggerty <mhagger@xxxxxxxxxxxx> · Thu, 26 Oct 2017 09:35:30 +0200

On 10/26/2017 08:46 AM, Junio C Hamano wrote:
> Michael Haggerty <mhagger@xxxxxxxxxxxx> writes:
> 
>> Even when we are deleting references, we needn't overwrite the
>> `packed-refs` file if the references that we are deleting are not
>> present in that file. Implement this optimization as follows:
> 
> This goal I can understand.  files-transaction-prepare may see a
> deletion and in order to avoid a deletion of a ref from the
> file-backend to expose a stale entry in the packed-refs file, it
> prepares a transaction to remove the same ref that might exist in
> it.  If it turns out that there is no such ref in the packed-refs
> file, then we can leave the packed-refs file as-is without
> rewriting.
> 
>> * Add a function `is_packed_transaction_noop()`, which checks whether
>>   a given packed-refs transaction doesn't actually have to do
>>   anything. This function must be called while holding the
>>   `packed-refs` lock to avoid races.
> 
> This checks three things only to cover the most trivial case (I am
> perfectly happy that it punts on more complex cases).  If an update
> 
>  - checks the old value,
> 
>  - sets a new value, or
> 
>  - deletes a ref that is not on the filesystem,
> 
> it is not a trivial case.  I can understand the latter two (i.e. We
> are special casing the deletion, and an update with a new value is
> not.  If removing a file from the filesystem is not sufficient to
> delete, we may have to rewrite the packed-refs), but not the first
> one.  Is it because currently we do not say "delete this ref only
> when its current value is X"?

It wouldn't be too hard to allow updates with REF_HAVE_OLD to be
optimized away too. I didn't do it because

1. We currently only create updates for that packed_refs backend with
REF_HAVE_OLD turned off, so such could would be unreachable (and
untestable).

2. I wanted to keep the patch as simple as possible in case you decide
to merge it into 2.15.

There would also be a little bit of a leveling violation (or maybe the
function name is not ideal). `is_packed_transaction_noop()` could check
that `old_oid` values are correct, and if they all are, declare the
transaction a NOOP. (It wasn't *really* a NOOP, but its OP, namely
checking old values, has already been carried out.) But what if it finds
that an `old_oid` was incorrect? Right now
`is_packed_transaction_noop()` has no way to report something like a
TRANSACTION_GENERIC_ERROR.

It could be that long-term it makes more sense for this optimization to
happen in `packed_transaction_prepare()`. Except that function is
sometimes called for its side-effects, like sorting an existing file, in
which case overwriting the `packed-refs` file shouldn't be optimized away.

So overall it seemed easier to punt on this optimization at this point.

> Also it is not immediately obvious how the "is this noop" helper is
> checking the absence of the same-named ref in the current
> packed-refs file, which is what matters for the correctness of the
> optimization.

The check is in the second loop in `is_packed_transaction_noop()`:

	if (!refs_read_raw_ref(ref_store, update->refname,
			       oid.hash, &referent, &type) ||
	    errno != ENOENT) {
		/*
		 * We might have to actually delete that
		 * reference -> not a NOOP.
		 */
		ret = 0;
		break;
	}

If the refname doesn't exist, then `refs_read_raw_ref()` returns -1 and
sets errno to ENOENT, and the loop continues. Otherwise we exit with a
value of 0, meaning that the transaction is not a NOOP.

There are a lot of double-negatives here. It might be easier to follow
the logic if the sense of the function were inverted:
`is_packed_transaction_needed()`. Let me know if you have a strong
feeling about it.

Michael