Re: [PATCH 1/8] refs: remove EINVAL specification from the errno sideband in read_raw_ref_fn

Junio C Hamano <gitster@xxxxxxxxx> · Fri, 30 Apr 2021 11:38:40 +0900

"Han-Wen Nienhuys via GitGitGadget" <gitgitgadget@xxxxxxxxx> writes:

> From: Han-Wen Nienhuys <hanwen@xxxxxxxxxx>
>
> A grep for EINVAL */*c reveals that no code inspects EINVAL after reading
> references.

We often use a pattern (which is common) like this:

	if (some_func_in_ref_API(...) < 0) {
		if (errno == ENOENT || errno == EISDIR)
			... it is OK for the file to be missing ...
		else
			... error ...
	}

If a piece of code currently sets EINVAL to errno manually when
signalling a failure by returning a negative value to communicate
with such a caller, we wouldn't see EINVAL mentioned, so such a grep
alone would not help us guarantee the correctness of an update to
stop assignment of EINVAL at all.  The callers must be vetted more
carefully than "we are happy that nobody explicitly mentions EINVAL".

> The files ref backend does use EINVAL so parse_loose_ref_contents() can
> communicate to lock_raw_ref() about garbage following the hex SHA1, or a short
> read in files_read_raw_ref(), but the files backend does not call into
> refs_read_raw_ref(), so its EINVAL sideband error is unused.

This paragraph is confusing.  It says EINVAL is used to signal
lock_raw_ref(), and it says EINVAL is not used by the same files
backend.  Which is correct?  If one part of the backend uses it, and
other parts don't, wouldn't the backend as a whole still use it?

Having said that, ...

> - * Return 0 on success. If the ref doesn't exist, set errno to ENOENT
> - * and return -1. If the ref exists but is neither a symbolic ref nor
> - * an object ID, it is broken; set REF_ISBROKEN in type, set errno to
> - * EINVAL, and return -1. If there is another error reading the ref,
> - * set errno appropriately and return -1.

... the mention (and requirement) of EINVAL seems redundant, as it
sounds sufficient for the caller to inspect 'type' to see if it is
REF_ISBOKEN.  So it may be OK for the code that gives REF_ISBROKEN
to type *not* to set errno to EINVAL, as long as it won't leave it
as ENOENT (meaning, an unrelated system call failed earlier may have
set errno to ENOENT, and after having dealt with such an error, the
control may have reached to the codepath we are interested in
here---errno must be cleared to some value other than ENOENT, and
assigning EINVAL is as good as any).

That is because there is a codeflow like this:

	if (files_read_raw_ref(...)) {
		if (errno == ENOENT) {
			... do various things ...
		} else if (errno == EISDIR) {
			... do different and various things ...
		} else if (errno == EINVAL && (*type & REF_ISBROKEN)) {
			... deal with broken ref ...
		}
		 ...
	}

where errno is looked at.

> + * Return 0 on success. If the ref doesn't exist, set errno to ENOENT and return
> + * -1. If the ref exists but is neither a symbolic ref nor an object ID, it is
> + * broken; set REF_ISBROKEN in type, and return -1. If there is another error
> + * reading the ref, set errno appropriately and return -1.

So, this is not sufficient to let caller correctly and safely handle
errors.  "set REF_ISBROKEN in type, set errno to something other
than ENOENT or EISDIR, and then return -1" is necessary, I would
think.