Re: [PATCH 04/20] refs_verify_refname_available(): implement once for all backends

Michael Haggerty <mhagger@xxxxxxxxxxxx> · Mon, 20 Mar 2017 23:20:40 +0100

On 03/20/2017 06:42 PM, Jeff King wrote:
> On Mon, Mar 20, 2017 at 05:33:09PM +0100, Michael Haggerty wrote:
> 
>> It turns out that we can now implement
>> `refs_verify_refname_available()` based on the other virtual
>> functions, so there is no need for it to be defined at the backend
>> level. Instead, define it once in `refs.c` and remove the
>> `files_backend` definition.
> 
> Does this mean that backends can no longer provide storage for
> D/F-conflicted refnames (i.e., "refs/foo" and "refs/foo/bar")? It looks
> like the global verify_refname_available() has that logic baked in.

The `verify_refname_available()` function implements the "no D/F
conflict" policy. But it's called from the backends, not from the common
code, and nobody says that a backend needs to call the function.

> [...]
>>  int refs_verify_refname_available(struct ref_store *refs,
>>  				  const char *refname,
>> -				  const struct string_list *extra,
>> +				  const struct string_list *extras,
>>  				  const struct string_list *skip,
>>  				  struct strbuf *err)
>>  {
>> [...]
>> +		/*
>> +		 * We are still at a leading dir of the refname (e.g.,
>> +		 * "refs/foo"; if there is a reference with that name,
>> +		 * it is a conflict, *unless* it is in skip.
>> +		 */
>> +		if (skip && string_list_has_string(skip, dirname.buf))
>> +			continue;
>> +
>> +		if (!refs_read_raw_ref(refs, dirname.buf, oid.hash, &referent, &type)) {
>> +			strbuf_addf(err, "'%s' exists; cannot create '%s'",
>> +				    dirname.buf, refname);
>> +			goto cleanup;
>> +		}
> 
> We don't really care about reading the ref value here; we just care if
> it exists. Does that matter for efficiency (e.g., for the files backend
> it's a stat() versus an open/read/close)? I guess the difference only
> matters when it _does_ exist, which is the uncommon error case.

Yes, I assume that the conflict cases are unusual so I wasn't worrying
too much about their performance.

Not quite so obvious is that the new code sometimes checks for conflicts
against both loose and packed references in situations where the old
code only checked against packed references. Specifically, this happens
when the code has just read, or locked, or failed to find a loose
reference, so it could infer that there are no loose references that
could conflict. I don't think that will be noticeable, because it is the
reading of the whole `packed-refs` file that is a big expensive
operation that it is important to avoid. Anyway, later patches (i.e.,
after there is a `packed_ref_store`) can switch back to checking only
packed references and get back the old optimization.

> (Also, I suspect the loose ref cache always just reads everything in the
> current code, though with the iterator approach in theory we could stop
> doing that).

The loose ref cache reads directories into the cache lazily,
breadth-first. So it reads all of the entries in the directories leading
to the reference being looked up, but no others. When iterating, it
reads the parent directories of the prefix that is being iterated over
plus the whole subtree under the prefix, and that's it (though this
optimization is not yet wired through to `git for-each-ref`).

Michael