On Fri, Oct 04, 2024 at 01:11:15PM +0200, Johannes Schindelin wrote: > Hi Patrick, > > > > The issue on Win32 is different: we cannot commit the "tables.list" lock > > > > via rename(3P) because the target file may be open for reading by a > > > > concurrent process. I guess that Cygwin has proper POSIX semantics for > > > > rename(3P) and thus doesn't hit the same issue. > > > > > > Indeed, this is where the symptom lies. I worked around it in Git for > > > Windows v2.47.0-rc1 via this patch: > > > > > > -- snipsnap -- > > > diff --git a/compat/mingw.c b/compat/mingw.c > > > index c1030e40f227..56db193d1360 100644 > > > --- a/compat/mingw.c > > > +++ b/compat/mingw.c > > > @@ -784,6 +784,7 @@ int mingw_open (const char *filename, int oflags, ...) > > > int fd, create = (oflags & (O_CREAT | O_EXCL)) == (O_CREAT | O_EXCL); > > > wchar_t wfilename[MAX_LONG_PATH]; > > > open_fn_t open_fn; > > > + int tries = 0; > > > > > > va_start(args, oflags); > > > mode = va_arg(args, int); > > > @@ -813,7 +814,11 @@ int mingw_open (const char *filename, int oflags, ...) > > > else if (xutftowcs_long_path(wfilename, filename) < 0) > > > return -1; > > > > > > - fd = open_fn(wfilename, oflags, mode); > > > + do { > > > + fd = open_fn(wfilename, oflags, mode); > > > + } while (fd < 0 && GetLastError() == ERROR_SHARING_VIOLATION && > > > + retry_ask_yes_no(&tries, "Opening '%s' failed because another " > > > + "application accessed it. Try again?", filename)); > > > > > > if ((oflags & O_CREAT) && fd >= 0 && are_wsl_compatible_mode_bits_enabled()) { > > > _mode_t wsl_mode = S_IFREG | (mode&0777); > > > > Wait, this is surprising to me as I saw the error happening when calling > > rename, not open. So why would retries in `open()` fix the symptom? I'm > > probably missing something. > > I am sorry, I did not really read the bug report in detail, I only meant > to unblock Git for Windows v2.47.0-rc1 and thought I had a fix in hand. It > certainly fixed the failures on my local machine, but it unfortunately did > not fix the problems in CI. Nothing to be sorry about. Quite on the contrary, thanks for chiming in here, I was hoping for your input. > I tried to debug in CI a bit, but it is a gnarly bug to investigate, what > with plenty of processes the intentionally block each other, and no `gdb` > to help. Yup, indeed it is. > > In any case I also tried something like the below patch (sorry, > > whitespace-broken). > > Oh, you reminded me that the `mingw_rename()` function looks > _substantially_ different in Git for Windows than in Git. I did not have > the time (or the strength of mind, more like) to upstream those changes > yet. > > > But unfortunately this still caused permission errors when the new path > > was held open by another process. > > Yes, this will _always_ be a problem, I think. The > `FILE_RENAME_POSIX_SEMANTICS` as per its documentation should help, but if > it does not in your tests it might actually not quite work as advertised > (wouldn't be the first time I encounter such an issue). It could of course be that the code I wrote is just plain wrong. I had to stub out the struct as well as the constants because those are not available on my Windows system, due to whatever reason. Not even when bumping _WIN32_WINNT to 0x0A00 (W10+). Maybe this is because I'm using Windows 10, but I thought that thish should have been available starting with Windows 10 RC1. > I tried to read through the code (it's a lot!) to figure out whether there > is potentially any situation when the `tables.list` file is opened but not > closed immediately, but couldn't find any. Do you know off-hand of any > such scenario? We indeed do keep the `tables.list` file descriptor open as part of our stat cache. But that only happens on non-Windows system. This all happens in `reftable_stack_reload_maybe_reuse()` and is documented quite extensively in there. > > I think for now I'd still lean into the direction of adding the !WINDOWS > > prerequisite to the test and increasing timeouts such that I can > > continue to investigate without time pressure. > > Let me bang my head against this problem for a little while longer. You > might be right, though, that this is a thing we cannot fix in time for > v2.47.0, which would be sad. A bit sad, yes, but as I mentioned this is at least not a regression. The test just demonstrates that the improvements I did in this area are not yet sufficient for all platforms and that I need to spend some more time on it. Or that the central "tables.list" mechanism is not a good fit for Windows, which would be a shame. I'll send the patch as a reply to this thread so that it can be picked up as required. Patrick