Re: [PATCH] reftable: ignore file-in-use errors when unlink(3p) fails on Windows

Patrick Steinhardt <ps@xxxxxx> · Mon, 27 Jan 2025 08:57:28 +0100

On Sat, Jan 25, 2025 at 05:41:42PM -0800, Junio C Hamano wrote:
> Patrick Steinhardt <ps@xxxxxx> writes:
> > diff --git a/compat/mingw.h b/compat/mingw.h
> > index ebfb8ba423..a555af8d54 100644
> > --- a/compat/mingw.h
> > +++ b/compat/mingw.h
> > @@ -224,8 +224,12 @@ int uname(struct utsname *buf);
> >   * replacements of existing functions
> >   */
> >  
> > -int mingw_unlink(const char *pathname);
> > -#define unlink mingw_unlink
> > +int mingw_unlink(const char *pathname, int handle_in_use_error);
> > +#ifdef MINGW_DONT_HANDLE_IN_USE_ERROR
> > +# define unlink(path) mingw_unlink(path, 0)
> > +#else
> > +# define unlink(path) mingw_unlink(path, 1)
> > +#endif
> 
> This one is yucky.  All calls to unlink() used in compilation units
> with the CPP macro defined are going to fail on a path that is in
> use, but in other code paths, there will be the retry loop.

Yeah, I don't like it either, but couldn't think of a better way to do
this.

> Regardless of the platform, the code must be prepared to see its
> unlink() fail and deal with the failure, but I wonder how much the
> initial "if file-in-use caused problem, retry with delay without
> bugging the end user" loop is helping.  
> 
> After that retry loop expires, we go interactive, and I can
> understand why end-users may be annoyed by the code going
> interactive at that point.
> 
> But wouldn't it be too drastic a change to break out of the retry
> loop immediately after the initial failure?

In general: yes. For the reftable library: no. The reftable backend
already knows to handle this situation just fine by simply retrying the
deletion on the next call to git-pack-refs(1). So the file will
eventually be removed, and it doesn't hurt to have it sitting around as
a stale file meanwhile.

> Unless the case found in reftable is that the process that has the
> file in use is ourselves but somebody else that is not under our
> control, it could be that the current users are being helped by the
> retry loop because these other users would quickly close and exit
> while we are retrying before going interactive.  What I am getting
> at is that it might be a less drastic move that helps users better
> if we moved the "let's just accept the failure and return to the
> caller" after that non-interactive retry loop, instead of "return
> after even the first failure."  That way, we'll still keep the
> automatic and non-interactive recovery, and punt a bit earlier than
> before before we go into the other interactive retry loop.

It would incur a delay though that is basically unnecessary. The file is
not being held open by the same process, but by a different one that is
not ourselves. And because Git opens with `FILE_SHARE_DELETE` on Windows
we even know that it wouldn't be a Git process that has the file open,
but something else (e.g. JGit, as reported by the user).

Chances are slim that such a third-party client would close the file in
the exact 71 milliseconds that we'd be sleeping for. And combined with
the fact that we know to clean up the file at a later point anyway I
think I'd rather not incur a user-facing delay only for a slight chance
that we might succeed unlinking the file.

> Of course, if we are depending on the ability to unlink what _we_
> ourselves are using, we should stop doing that by reorganizing the
> code.  I recall we have done such a code shuffling to avoid removing
> open files by flipping the order between unlink and close before
> only to mollify Windows already in other code paths.  But if we are
> failing due to random other users having the file open at the same
> time, at least the earlier non-interactive retry loop sounds like a
> reasonable workaround for quirks in the underlying filesystem to me.

As far as I'm aware this isn't an issue in the reftable library, we're
being quite careful there. I'm quite sure that we'd otherwise have
gotten reports about that before seeing reports about JGit keeping the
files open.

Patrick