Re: v2.47.0-rc1 test failure on cygwin

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Patrick,

On Fri, 4 Oct 2024, Patrick Steinhardt wrote:

> On Fri, Oct 04, 2024 at 05:59:30AM +0200, Patrick Steinhardt wrote:
> > On Fri, Oct 04, 2024 at 02:02:44AM +0100, Ramsay Jones wrote:
> > >
> > > Just a quick heads up: t0610-reftable-basics.sh test 47 (ref
> > > transaction: many concurrent writers) fails on cygwin. The tail end
> > > of the debug output for this test looks like:
> > >
> > [snip]
> > >
> > > t0610-reftable-basics.sh passed on 'rc0', but this test (and the
> > > timeout facility) is new in 'rc1'. I tried simply increasing the
> > > timeout (10 fold), but that didn't change the result. (I didn't
> > > really expect it to - the 'reftable: transaction prepare: I/O error'
> > > does not look timing related!).
> > >
> > > Again, just a heads up. (I can't look at it until tomorrow now; any
> > > ideas?)
> >
> > This failure is kind of known and discussed in [1]. Just to make it
> > explicit: this test failure doesn't really surface a regression, the
> > reftable code already failed for concurrent writes before. I fixed
> > that and added the test that is now flaky, as the fix itself is
> > seemingly only sufficient on Linux and macOS.
> >
> > I didn't yet have the time to look at whether I can fix it, but should
> > finally find the time to do so today.
>
> Hm, interestingly enough I cannot reproduce the issue on Cygwin myself,
> but I can reproduce the issue with MinGW. And in fact, the logs you have
> sent all indicate that we cannot acquire the lock, there is no sign of
> I/O errors here. So I guess you're running into timeout issues. Does the
> following patch fix this for you?
>
> diff --git a/t/t0610-reftable-basics.sh b/t/t0610-reftable-basics.sh
> index 2d951c8ceb..b5cad805d4 100755
> --- a/t/t0610-reftable-basics.sh
> +++ b/t/t0610-reftable-basics.sh
> @@ -455,10 +455,7 @@ test_expect_success 'ref transaction: many concurrent writers' '
>  	git init repo &&
>  	(
>  		cd repo &&
> -		# Set a high timeout such that a busy CI machine will not abort
> -		# early. 10 seconds should hopefully be ample of time to make
> -		# this non-flaky.
> -		git config set reftable.lockTimeout 10000 &&
> +		git config set reftable.lockTimeout -1 &&
>  		test_commit --no-tag initial &&
>
>  		head=$(git rev-parse HEAD) &&

That looks plausible to me, as this test is exercising the POSIX emulation
layer of Cygwin much more than Git itself (and therefore I expect this
test case to take a really long time on Cygwin). If it turns out that this
works around the issue, I would propose to use something like this
instead:

		# Cygwin needs a slightly longer timeout
		if have_prereq !CYGWIN
		then
			git config set reftable.lockTimeout 10000
		else
			git config set reftable.lockTimeout 60000
		fi &&

> The issue on Win32 is different: we cannot commit the "tables.list" lock
> via rename(3P) because the target file may be open for reading by a
> concurrent process. I guess that Cygwin has proper POSIX semantics for
> rename(3P) and thus doesn't hit the same issue.

Indeed, this is where the symptom lies. I worked around it in Git for
Windows v2.47.0-rc1 via this patch:

-- snipsnap --
diff --git a/compat/mingw.c b/compat/mingw.c
index c1030e40f227..56db193d1360 100644
--- a/compat/mingw.c
+++ b/compat/mingw.c
@@ -784,6 +784,7 @@ int mingw_open (const char *filename, int oflags, ...)
 	int fd, create = (oflags & (O_CREAT | O_EXCL)) == (O_CREAT | O_EXCL);
 	wchar_t wfilename[MAX_LONG_PATH];
 	open_fn_t open_fn;
+	int tries = 0;

 	va_start(args, oflags);
 	mode = va_arg(args, int);
@@ -813,7 +814,11 @@ int mingw_open (const char *filename, int oflags, ...)
 	else if (xutftowcs_long_path(wfilename, filename) < 0)
 		return -1;

-	fd = open_fn(wfilename, oflags, mode);
+	do {
+		fd = open_fn(wfilename, oflags, mode);
+	} while (fd < 0 && GetLastError() == ERROR_SHARING_VIOLATION &&
+		 retry_ask_yes_no(&tries, "Opening '%s' failed because another "
+			"application accessed it. Try again?", filename));

 	if ((oflags & O_CREAT) && fd >= 0 && are_wsl_compatible_mode_bits_enabled()) {
 		_mode_t wsl_mode = S_IFREG | (mode&0777);





[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux