Re: v2.47.0-rc1 test failure on cygwin

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Patrick,

On Fri, 4 Oct 2024, Patrick Steinhardt wrote:

> On Fri, Oct 04, 2024 at 11:13:34AM +0200, Johannes Schindelin wrote:
> >
> > On Fri, 4 Oct 2024, Patrick Steinhardt wrote:
> >
> > > diff --git a/t/t0610-reftable-basics.sh b/t/t0610-reftable-basics.sh
> > > index 2d951c8ceb..b5cad805d4 100755
> > > --- a/t/t0610-reftable-basics.sh
> > > +++ b/t/t0610-reftable-basics.sh
> > > @@ -455,10 +455,7 @@ test_expect_success 'ref transaction: many concurrent writers' '
> > >  	git init repo &&
> > >  	(
> > >  		cd repo &&
> > > -		# Set a high timeout such that a busy CI machine will not abort
> > > -		# early. 10 seconds should hopefully be ample of time to make
> > > -		# this non-flaky.
> > > -		git config set reftable.lockTimeout 10000 &&
> > > +		git config set reftable.lockTimeout -1 &&
> > >  		test_commit --no-tag initial &&
> > >
> > >  		head=$(git rev-parse HEAD) &&
> >
> > That looks plausible to me, as this test is exercising the POSIX emulation
> > layer of Cygwin much more than Git itself (and therefore I expect this
> > test case to take a really long time on Cygwin). If it turns out that this
> > works around the issue, I would propose to use something like this
> > instead:
> >
> > 		# Cygwin needs a slightly longer timeout
> > 		if have_prereq !CYGWIN
> > 		then
> > 			git config set reftable.lockTimeout 10000
> > 		else
> > 			git config set reftable.lockTimeout 60000
> > 		fi &&
>
> We also saw that this creates an issue when running e.g. with the memory
> and leak sanitizers. So I'd just generally raise the timeout value to
> something way higher to avoid flakiness, like 5 minutes.

Sounds good!

> > > The issue on Win32 is different: we cannot commit the "tables.list" lock
> > > via rename(3P) because the target file may be open for reading by a
> > > concurrent process. I guess that Cygwin has proper POSIX semantics for
> > > rename(3P) and thus doesn't hit the same issue.
> >
> > Indeed, this is where the symptom lies. I worked around it in Git for
> > Windows v2.47.0-rc1 via this patch:
> >
> > -- snipsnap --
> > diff --git a/compat/mingw.c b/compat/mingw.c
> > index c1030e40f227..56db193d1360 100644
> > --- a/compat/mingw.c
> > +++ b/compat/mingw.c
> > @@ -784,6 +784,7 @@ int mingw_open (const char *filename, int oflags, ...)
> >  	int fd, create = (oflags & (O_CREAT | O_EXCL)) == (O_CREAT | O_EXCL);
> >  	wchar_t wfilename[MAX_LONG_PATH];
> >  	open_fn_t open_fn;
> > +	int tries = 0;
> >
> >  	va_start(args, oflags);
> >  	mode = va_arg(args, int);
> > @@ -813,7 +814,11 @@ int mingw_open (const char *filename, int oflags, ...)
> >  	else if (xutftowcs_long_path(wfilename, filename) < 0)
> >  		return -1;
> >
> > -	fd = open_fn(wfilename, oflags, mode);
> > +	do {
> > +		fd = open_fn(wfilename, oflags, mode);
> > +	} while (fd < 0 && GetLastError() == ERROR_SHARING_VIOLATION &&
> > +		 retry_ask_yes_no(&tries, "Opening '%s' failed because another "
> > +			"application accessed it. Try again?", filename));
> >
> >  	if ((oflags & O_CREAT) && fd >= 0 && are_wsl_compatible_mode_bits_enabled()) {
> >  		_mode_t wsl_mode = S_IFREG | (mode&0777);
>
> Wait, this is surprising to me as I saw the error happening when calling
> rename, not open. So why would retries in `open()` fix the symptom? I'm
> probably missing something.

I am sorry, I did not really read the bug report in detail, I only meant
to unblock Git for Windows v2.47.0-rc1 and thought I had a fix in hand. It
certainly fixed the failures on my local machine, but it unfortunately did
not fix the problems in CI.

I tried to debug in CI a bit, but it is a gnarly bug to investigate, what
with plenty of processes the intentionally block each other, and no `gdb`
to help.

> In any case I also tried something like the below patch (sorry,
> whitespace-broken).

Oh, you reminded me that the `mingw_rename()` function looks
_substantially_ different in Git for Windows than in Git. I did not have
the time (or the strength of mind, more like) to upstream those changes
yet.

> But unfortunately this still caused permission errors when the new path
> was held open by another process.

Yes, this will _always_ be a problem, I think. The
`FILE_RENAME_POSIX_SEMANTICS` as per its documentation should help, but if
it does not in your tests it might actually not quite work as advertised
(wouldn't be the first time I encounter such an issue).

I tried to read through the code (it's a lot!) to figure out whether there
is potentially any situation when the `tables.list` file is opened but not
closed immediately, but couldn't find any. Do you know off-hand of any
such scenario?

> I think for now I'd still lean into the direction of adding the !WINDOWS
> prerequisite to the test and increasing timeouts such that I can
> continue to investigate without time pressure.

Let me bang my head against this problem for a little while longer. You
might be right, though, that this is a thing we cannot fix in time for
v2.47.0, which would be sad.

Ciao,
Johannes

>
> Patrick
>
> @ -2152,9 +2152,12 @@ int mingw_accept(int sockfd1, struct sockaddr *sa, socklen_t *sz)
>  int mingw_rename(const char *pold, const char *pnew)
>  {
>         DWORD attrs, gle;
> -       int tries = 0;
> +       int tries = 0, wpnew_len, wpold_len;
>         wchar_t wpold[MAX_PATH], wpnew[MAX_PATH];
> -       if (xutftowcs_path(wpold, pold) < 0 || xutftowcs_path(wpnew, pnew) < 0)
> +
> +       wpold_len = xutftowcs_path(wpold, pold);
> +       wpnew_len = xutftowcs_path(wpnew, pnew);
> +       if (wpold_len < 0 || wpnew_len < 0)
>                 return -1;
>
>         /*
> @@ -2166,6 +2169,58 @@ int mingw_rename(const char *pold, const char *pnew)
>         if (errno != EEXIST)
>                 return -1;
>  repeat:
> +       {
> +#define FileRenameInfoEx                      22
> +               enum {
> +                       FILE_RENAME_REPLACE_IF_EXISTS         = 0x01,
> +                       FILE_RENAME_POSIX_SEMANTICS           = 0x02,
> +                       FILE_RENAME_IGNORE_READONLY_ATTRIBUTE = 0x40,
> +               };
> +               typedef struct FILE_RENAME_INFORMATION {
> +                       union {
> +                               BOOLEAN ReplaceIfExists;
> +                               ULONG Flags;
> +                       };
> +                       HANDLE RootDirectory;
> +                       ULONG FileNameLength;
> +                       WCHAR FileName[1];
> +               } *PFILE_RENAME_INFORMATION;
> +               size_t wpnew_size = wpnew_len * sizeof(wchar_t);
> +               PFILE_RENAME_INFORMATION fri = NULL;
> +               HANDLE handle = NULL;
> +               BOOL success;
> +               int ret;
> +
> +               handle = CreateFileW(wpold, DELETE,
> +                                    FILE_SHARE_READ | FILE_SHARE_WRITE | FILE_SHARE_DELETE, NULL,
> +                                    OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, NULL);
> +               if (handle == INVALID_HANDLE_VALUE) {
> +                       errno = ENOENT;
> +                       ret = -1;
> +                       goto out;
> +               }
> +
> +               fri = xcalloc(1, sizeof(*fri) + wpnew_size);
> +               fri->Flags = FILE_RENAME_REPLACE_IF_EXISTS | FILE_RENAME_POSIX_SEMANTICS |
> +                            FILE_RENAME_IGNORE_READONLY_ATTRIBUTE;
> +               fri->FileNameLength = wpnew_len;
> +               memcpy(fri->FileName, wpnew, wpnew_size);
> +
> +               success = SetFileInformationByHandle(handle, FileRenameInfoEx,
> +                                                    fri, sizeof(*fri) + wpnew_size);
> +               if (!success) {
> +                       errno = err_win_to_posix(GetLastError());
> +                       ret = -1;
> +                       goto out;
> +               }
> +
> +               ret = 0;
> +out:
> +               CloseHandle(handle);
> +               free(fri);
> +               return ret;
> +       }
> +
>
>





[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux