Hi Patrick, On Fri, 4 Oct 2024, Patrick Steinhardt wrote: > On Fri, Oct 04, 2024 at 11:13:34AM +0200, Johannes Schindelin wrote: > > > > On Fri, 4 Oct 2024, Patrick Steinhardt wrote: > > > > > diff --git a/t/t0610-reftable-basics.sh b/t/t0610-reftable-basics.sh > > > index 2d951c8ceb..b5cad805d4 100755 > > > --- a/t/t0610-reftable-basics.sh > > > +++ b/t/t0610-reftable-basics.sh > > > @@ -455,10 +455,7 @@ test_expect_success 'ref transaction: many concurrent writers' ' > > > git init repo && > > > ( > > > cd repo && > > > - # Set a high timeout such that a busy CI machine will not abort > > > - # early. 10 seconds should hopefully be ample of time to make > > > - # this non-flaky. > > > - git config set reftable.lockTimeout 10000 && > > > + git config set reftable.lockTimeout -1 && > > > test_commit --no-tag initial && > > > > > > head=$(git rev-parse HEAD) && > > > > That looks plausible to me, as this test is exercising the POSIX emulation > > layer of Cygwin much more than Git itself (and therefore I expect this > > test case to take a really long time on Cygwin). If it turns out that this > > works around the issue, I would propose to use something like this > > instead: > > > > # Cygwin needs a slightly longer timeout > > if have_prereq !CYGWIN > > then > > git config set reftable.lockTimeout 10000 > > else > > git config set reftable.lockTimeout 60000 > > fi && > > We also saw that this creates an issue when running e.g. with the memory > and leak sanitizers. So I'd just generally raise the timeout value to > something way higher to avoid flakiness, like 5 minutes. Sounds good! > > > The issue on Win32 is different: we cannot commit the "tables.list" lock > > > via rename(3P) because the target file may be open for reading by a > > > concurrent process. I guess that Cygwin has proper POSIX semantics for > > > rename(3P) and thus doesn't hit the same issue. > > > > Indeed, this is where the symptom lies. I worked around it in Git for > > Windows v2.47.0-rc1 via this patch: > > > > -- snipsnap -- > > diff --git a/compat/mingw.c b/compat/mingw.c > > index c1030e40f227..56db193d1360 100644 > > --- a/compat/mingw.c > > +++ b/compat/mingw.c > > @@ -784,6 +784,7 @@ int mingw_open (const char *filename, int oflags, ...) > > int fd, create = (oflags & (O_CREAT | O_EXCL)) == (O_CREAT | O_EXCL); > > wchar_t wfilename[MAX_LONG_PATH]; > > open_fn_t open_fn; > > + int tries = 0; > > > > va_start(args, oflags); > > mode = va_arg(args, int); > > @@ -813,7 +814,11 @@ int mingw_open (const char *filename, int oflags, ...) > > else if (xutftowcs_long_path(wfilename, filename) < 0) > > return -1; > > > > - fd = open_fn(wfilename, oflags, mode); > > + do { > > + fd = open_fn(wfilename, oflags, mode); > > + } while (fd < 0 && GetLastError() == ERROR_SHARING_VIOLATION && > > + retry_ask_yes_no(&tries, "Opening '%s' failed because another " > > + "application accessed it. Try again?", filename)); > > > > if ((oflags & O_CREAT) && fd >= 0 && are_wsl_compatible_mode_bits_enabled()) { > > _mode_t wsl_mode = S_IFREG | (mode&0777); > > Wait, this is surprising to me as I saw the error happening when calling > rename, not open. So why would retries in `open()` fix the symptom? I'm > probably missing something. I am sorry, I did not really read the bug report in detail, I only meant to unblock Git for Windows v2.47.0-rc1 and thought I had a fix in hand. It certainly fixed the failures on my local machine, but it unfortunately did not fix the problems in CI. I tried to debug in CI a bit, but it is a gnarly bug to investigate, what with plenty of processes the intentionally block each other, and no `gdb` to help. > In any case I also tried something like the below patch (sorry, > whitespace-broken). Oh, you reminded me that the `mingw_rename()` function looks _substantially_ different in Git for Windows than in Git. I did not have the time (or the strength of mind, more like) to upstream those changes yet. > But unfortunately this still caused permission errors when the new path > was held open by another process. Yes, this will _always_ be a problem, I think. The `FILE_RENAME_POSIX_SEMANTICS` as per its documentation should help, but if it does not in your tests it might actually not quite work as advertised (wouldn't be the first time I encounter such an issue). I tried to read through the code (it's a lot!) to figure out whether there is potentially any situation when the `tables.list` file is opened but not closed immediately, but couldn't find any. Do you know off-hand of any such scenario? > I think for now I'd still lean into the direction of adding the !WINDOWS > prerequisite to the test and increasing timeouts such that I can > continue to investigate without time pressure. Let me bang my head against this problem for a little while longer. You might be right, though, that this is a thing we cannot fix in time for v2.47.0, which would be sad. Ciao, Johannes > > Patrick > > @ -2152,9 +2152,12 @@ int mingw_accept(int sockfd1, struct sockaddr *sa, socklen_t *sz) > int mingw_rename(const char *pold, const char *pnew) > { > DWORD attrs, gle; > - int tries = 0; > + int tries = 0, wpnew_len, wpold_len; > wchar_t wpold[MAX_PATH], wpnew[MAX_PATH]; > - if (xutftowcs_path(wpold, pold) < 0 || xutftowcs_path(wpnew, pnew) < 0) > + > + wpold_len = xutftowcs_path(wpold, pold); > + wpnew_len = xutftowcs_path(wpnew, pnew); > + if (wpold_len < 0 || wpnew_len < 0) > return -1; > > /* > @@ -2166,6 +2169,58 @@ int mingw_rename(const char *pold, const char *pnew) > if (errno != EEXIST) > return -1; > repeat: > + { > +#define FileRenameInfoEx 22 > + enum { > + FILE_RENAME_REPLACE_IF_EXISTS = 0x01, > + FILE_RENAME_POSIX_SEMANTICS = 0x02, > + FILE_RENAME_IGNORE_READONLY_ATTRIBUTE = 0x40, > + }; > + typedef struct FILE_RENAME_INFORMATION { > + union { > + BOOLEAN ReplaceIfExists; > + ULONG Flags; > + }; > + HANDLE RootDirectory; > + ULONG FileNameLength; > + WCHAR FileName[1]; > + } *PFILE_RENAME_INFORMATION; > + size_t wpnew_size = wpnew_len * sizeof(wchar_t); > + PFILE_RENAME_INFORMATION fri = NULL; > + HANDLE handle = NULL; > + BOOL success; > + int ret; > + > + handle = CreateFileW(wpold, DELETE, > + FILE_SHARE_READ | FILE_SHARE_WRITE | FILE_SHARE_DELETE, NULL, > + OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, NULL); > + if (handle == INVALID_HANDLE_VALUE) { > + errno = ENOENT; > + ret = -1; > + goto out; > + } > + > + fri = xcalloc(1, sizeof(*fri) + wpnew_size); > + fri->Flags = FILE_RENAME_REPLACE_IF_EXISTS | FILE_RENAME_POSIX_SEMANTICS | > + FILE_RENAME_IGNORE_READONLY_ATTRIBUTE; > + fri->FileNameLength = wpnew_len; > + memcpy(fri->FileName, wpnew, wpnew_size); > + > + success = SetFileInformationByHandle(handle, FileRenameInfoEx, > + fri, sizeof(*fri) + wpnew_size); > + if (!success) { > + errno = err_win_to_posix(GetLastError()); > + ret = -1; > + goto out; > + } > + > + ret = 0; > +out: > + CloseHandle(handle); > + free(fri); > + return ret; > + } > + > >