Re: [PATCH v2 2/3] strbuf: set errno to 0 after strbuf_getcwd

Kyle Lippincott <spectral@xxxxxxxxxx> · Fri, 2 Aug 2024 16:51:31 -0700

On Fri, Aug 2, 2024 at 2:32 PM Junio C Hamano <gitster@xxxxxxxxx> wrote:
>
> "Kyle Lippincott via GitGitGadget" <gitgitgadget@xxxxxxxxx> writes:
>
> > From: Kyle Lippincott <spectral@xxxxxxxxxx>
> >
> > If the loop executes more than once due to cwd being longer than 128
> > bytes, then `errno = ERANGE` might persist outside of this function.
> > This technically shouldn't be a problem, as all locations where the
> > value in `errno` is tested should either (a) call a function that's
> > guaranteed to set `errno` to 0 on success, or (b) set `errno` to 0 prior
> > to calling the function that only conditionally sets errno, such as the
> > `strtod` function. In the case of functions in category (b), it's easy
> > to forget to do that.
> >
> > Set `errno = 0;` prior to exiting from `strbuf_getcwd` successfully.
> > This matches the behavior in functions like `run_transaction_hook`
> > (refs.c:2176) and `read_ref_internal` (refs/files-backend.c:564).
>
> I am still uneasy to see this unconditional clearing, which looks
> more like spreading the bad practice from two places you identified
> than following good behaviour modelled after these two places.
>
> But I'll let it pass.
>
> As long as our programmers understand that across strbuf_getcwd(),
> errno will *not* be preserved, even if the function returns success,
> it would be OK.  As the usual convention around errno is that a
> successful call would leave errno intact, not clear it to 0, it
> would make it a bit harder to learn our API for newcomers, though.

I'm sympathetic to that argument. If you'd prefer to not have this
patch, I'm fine with it not landing, and instead at some future date I
may try to work on those #leftoverbits from the previous patch (to
make a safer wrapper around strtoX, and ban the use of the unwrapped
versions), or someone else can if they beat me to it.

Since this is wrapping a posix function, and posix has things to say
about this (see below), I agree that it shouldn't set it to 0, and
withdraw this patch.

I'm including my references below mostly because with the information
I just acquired, I think that any attempt to _preserve_ errno is also
folly. No function we write, unless we explicitly state that it _will_
preserve errno, should feel obligated to do so. The number of cases
where errno _could_ be modified according to the various
specifications (C99 and posix) are just too numerous.

---

Perhaps because I'm not all that experienced with C, but when I did C
a couple decades ago, I operated in a mode where basically every
function was actively hostile. If I wanted errno preserved across a
function call, then it's up to me (the caller) to do so, regardless of
what the current implementation of that function says will happen,
because that can change at any point. Unless the function is
documented as errno-preserving, I'm going to treat it as
errno-hostile. In practice, this didn't really matter much, as I've
never found `if (some_func()) { if (!some_other_func()) { /* use errno
from `some_func` */ } }` logic to happen often, but maybe it does in
"real" programs, I was just a hobbyist self-teaching at the time.

The C standard has a very precise definition of how the library
functions defined in the C specification will act. It guarantees:
- the library functions defined in the specification will never set errno to 0.
- the library functions defined in the specification may set the value
to non-zero whether an error occurs or not, "provided the use of errno
is not documented in the description of the function in this
International Standard". What this means is that (a) if the function
as defined in the C standard mentions errno, it can only set the
values as specified there, and (b) if the function as defined in the C
standard does _not_ mention errno, such as `fopen` or `strstr`, it can
do _whatever it wants_ to errno, even on success, _except_ set it to
0.

POSIX has similar language
(https://pubs.opengroup.org/onlinepubs/009695399/functions/errno.html),
with some key differences:
- The value of errno should only be examined when it is indicated to
be valid by a function's return value.
- The setting of errno after a successful call to a function is
unspecified unless the description of that function specifies that
errno shall not be modified.

This means that unlike the C specification, which says that if a
function doesn't describe its use of errno it can do anything it wants
to errno [except set it to 0], in POSIX, a function can do anything it
wants to errno [except set it to 0] at any time.

What this means in practice is that errno should never be assumed to
be preserved across calls to posix functions (like getcwd). Also,
strbuf_getcwd calls free, malloc, and realloc, none of which mention
errno in the C specification, so they can do whatever they want to it
[except set it to 0]. That I was able to find one single function that
was causing problems is luck, and not guaranteed by any specification.

Kind of makes me want to try writing an actively hostile C99 and POSIX
environment, and see how many things break with it. :) C99 spec
doesn't say anything about malloc setting errno? Ok! malloc now sets
errno to ENOENT on tuesdays [in GMT because I'm not a monster], but
only on success. On any other day, it'll set it to ERANGE, regardless
of success or failure.

>
> Thanks.
>
> > Signed-off-by: Kyle Lippincott <spectral@xxxxxxxxxx>
> > ---
> >  strbuf.c | 1 +
> >  1 file changed, 1 insertion(+)
> >
> > diff --git a/strbuf.c b/strbuf.c
> > index 3d2189a7f64..b94ef040ab0 100644
> > --- a/strbuf.c
> > +++ b/strbuf.c
> > @@ -601,6 +601,7 @@ int strbuf_getcwd(struct strbuf *sb)
> >               strbuf_grow(sb, guessed_len);
> >               if (getcwd(sb->buf, sb->alloc)) {
> >                       strbuf_setlen(sb, strlen(sb->buf));
> > +                     errno = 0;
> >                       return 0;
> >               }