Re: v2.47.0-rc1 test failure on cygwin

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Oct 04, 2024 at 11:13:34AM +0200, Johannes Schindelin wrote:
> Hi Patrick,
> 
> On Fri, 4 Oct 2024, Patrick Steinhardt wrote:
> 
> > On Fri, Oct 04, 2024 at 05:59:30AM +0200, Patrick Steinhardt wrote:
> > > On Fri, Oct 04, 2024 at 02:02:44AM +0100, Ramsay Jones wrote:
> > > >
> > > > Just a quick heads up: t0610-reftable-basics.sh test 47 (ref
> > > > transaction: many concurrent writers) fails on cygwin. The tail end
> > > > of the debug output for this test looks like:
> > > >
> > > [snip]
> > > >
> > > > t0610-reftable-basics.sh passed on 'rc0', but this test (and the
> > > > timeout facility) is new in 'rc1'. I tried simply increasing the
> > > > timeout (10 fold), but that didn't change the result. (I didn't
> > > > really expect it to - the 'reftable: transaction prepare: I/O error'
> > > > does not look timing related!).
> > > >
> > > > Again, just a heads up. (I can't look at it until tomorrow now; any
> > > > ideas?)
> > >
> > > This failure is kind of known and discussed in [1]. Just to make it
> > > explicit: this test failure doesn't really surface a regression, the
> > > reftable code already failed for concurrent writes before. I fixed
> > > that and added the test that is now flaky, as the fix itself is
> > > seemingly only sufficient on Linux and macOS.
> > >
> > > I didn't yet have the time to look at whether I can fix it, but should
> > > finally find the time to do so today.
> >
> > Hm, interestingly enough I cannot reproduce the issue on Cygwin myself,
> > but I can reproduce the issue with MinGW. And in fact, the logs you have
> > sent all indicate that we cannot acquire the lock, there is no sign of
> > I/O errors here. So I guess you're running into timeout issues. Does the
> > following patch fix this for you?
> >
> > diff --git a/t/t0610-reftable-basics.sh b/t/t0610-reftable-basics.sh
> > index 2d951c8ceb..b5cad805d4 100755
> > --- a/t/t0610-reftable-basics.sh
> > +++ b/t/t0610-reftable-basics.sh
> > @@ -455,10 +455,7 @@ test_expect_success 'ref transaction: many concurrent writers' '
> >  	git init repo &&
> >  	(
> >  		cd repo &&
> > -		# Set a high timeout such that a busy CI machine will not abort
> > -		# early. 10 seconds should hopefully be ample of time to make
> > -		# this non-flaky.
> > -		git config set reftable.lockTimeout 10000 &&
> > +		git config set reftable.lockTimeout -1 &&
> >  		test_commit --no-tag initial &&
> >
> >  		head=$(git rev-parse HEAD) &&
> 
> That looks plausible to me, as this test is exercising the POSIX emulation
> layer of Cygwin much more than Git itself (and therefore I expect this
> test case to take a really long time on Cygwin). If it turns out that this
> works around the issue, I would propose to use something like this
> instead:
> 
> 		# Cygwin needs a slightly longer timeout
> 		if have_prereq !CYGWIN
> 		then
> 			git config set reftable.lockTimeout 10000
> 		else
> 			git config set reftable.lockTimeout 60000
> 		fi &&

We also saw that this creates an issue when running e.g. with the memory
and leak sanitizers. So I'd just generally raise the timeout value to
something way higher to avoid flakiness, like 5 minutes.

> > The issue on Win32 is different: we cannot commit the "tables.list" lock
> > via rename(3P) because the target file may be open for reading by a
> > concurrent process. I guess that Cygwin has proper POSIX semantics for
> > rename(3P) and thus doesn't hit the same issue.
> 
> Indeed, this is where the symptom lies. I worked around it in Git for
> Windows v2.47.0-rc1 via this patch:
> 
> -- snipsnap --
> diff --git a/compat/mingw.c b/compat/mingw.c
> index c1030e40f227..56db193d1360 100644
> --- a/compat/mingw.c
> +++ b/compat/mingw.c
> @@ -784,6 +784,7 @@ int mingw_open (const char *filename, int oflags, ...)
>  	int fd, create = (oflags & (O_CREAT | O_EXCL)) == (O_CREAT | O_EXCL);
>  	wchar_t wfilename[MAX_LONG_PATH];
>  	open_fn_t open_fn;
> +	int tries = 0;
> 
>  	va_start(args, oflags);
>  	mode = va_arg(args, int);
> @@ -813,7 +814,11 @@ int mingw_open (const char *filename, int oflags, ...)
>  	else if (xutftowcs_long_path(wfilename, filename) < 0)
>  		return -1;
> 
> -	fd = open_fn(wfilename, oflags, mode);
> +	do {
> +		fd = open_fn(wfilename, oflags, mode);
> +	} while (fd < 0 && GetLastError() == ERROR_SHARING_VIOLATION &&
> +		 retry_ask_yes_no(&tries, "Opening '%s' failed because another "
> +			"application accessed it. Try again?", filename));
> 
>  	if ((oflags & O_CREAT) && fd >= 0 && are_wsl_compatible_mode_bits_enabled()) {
>  		_mode_t wsl_mode = S_IFREG | (mode&0777);

Wait, this is surprising to me as I saw the error happening when calling
rename, not open. So why would retries in `open()` fix the symptom? I'm
probably missing something.

In any case I also tried something like the below patch (sorry,
whitespace-broken). But unfortunately this still caused permission
errors when the new path was held open by another process.

I think for now I'd still lean into the direction of adding the !WINDOWS
prerequisite to the test and increasing timeouts such that I can
continue to investigate without time pressure.

Patrick

@ -2152,9 +2152,12 @@ int mingw_accept(int sockfd1, struct sockaddr *sa, socklen_t *sz)
 int mingw_rename(const char *pold, const char *pnew)
 {
        DWORD attrs, gle;
-       int tries = 0;
+       int tries = 0, wpnew_len, wpold_len;
        wchar_t wpold[MAX_PATH], wpnew[MAX_PATH];
-       if (xutftowcs_path(wpold, pold) < 0 || xutftowcs_path(wpnew, pnew) < 0)
+
+       wpold_len = xutftowcs_path(wpold, pold);
+       wpnew_len = xutftowcs_path(wpnew, pnew);
+       if (wpold_len < 0 || wpnew_len < 0)
                return -1;
 
        /*
@@ -2166,6 +2169,58 @@ int mingw_rename(const char *pold, const char *pnew)
        if (errno != EEXIST)
                return -1;
 repeat:
+       {
+#define FileRenameInfoEx                      22
+               enum {
+                       FILE_RENAME_REPLACE_IF_EXISTS         = 0x01,
+                       FILE_RENAME_POSIX_SEMANTICS           = 0x02,
+                       FILE_RENAME_IGNORE_READONLY_ATTRIBUTE = 0x40,
+               };
+               typedef struct FILE_RENAME_INFORMATION {
+                       union {
+                               BOOLEAN ReplaceIfExists;
+                               ULONG Flags;
+                       };
+                       HANDLE RootDirectory;
+                       ULONG FileNameLength;
+                       WCHAR FileName[1];
+               } *PFILE_RENAME_INFORMATION;
+               size_t wpnew_size = wpnew_len * sizeof(wchar_t);
+               PFILE_RENAME_INFORMATION fri = NULL;
+               HANDLE handle = NULL;
+               BOOL success;
+               int ret;
+
+               handle = CreateFileW(wpold, DELETE,
+                                    FILE_SHARE_READ | FILE_SHARE_WRITE | FILE_SHARE_DELETE, NULL,
+                                    OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, NULL);
+               if (handle == INVALID_HANDLE_VALUE) {
+                       errno = ENOENT;
+                       ret = -1;
+                       goto out;
+               }
+
+               fri = xcalloc(1, sizeof(*fri) + wpnew_size);
+               fri->Flags = FILE_RENAME_REPLACE_IF_EXISTS | FILE_RENAME_POSIX_SEMANTICS |
+                            FILE_RENAME_IGNORE_READONLY_ATTRIBUTE;
+               fri->FileNameLength = wpnew_len;
+               memcpy(fri->FileName, wpnew, wpnew_size);
+
+               success = SetFileInformationByHandle(handle, FileRenameInfoEx,
+                                                    fri, sizeof(*fri) + wpnew_size);
+               if (!success) {
+                       errno = err_win_to_posix(GetLastError());
+                       ret = -1;
+                       goto out;
+               }
+
+               ret = 0;
+out:
+               CloseHandle(handle);
+               free(fri);
+               return ret;
+       }
+




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux