On Mon, Aug 13 2018, Johannes Sixt wrote: > Am 13.08.2018 um 22:20 schrieb Junio C Hamano: >> Johannes Sixt <j6t@xxxxxxxx> writes: >> >>> The Windows CRT implements O_APPEND "manually": on write() calls, the >>> file pointer is set to EOF before the data is written. Clearly, this is >>> not atomic. And in fact, this is the root cause of failures observed in >>> t5552-skipping-fetch-negotiator.sh and t5503-tagfollow.sh, where >>> different processes write to the same trace file simultanously; it also >>> occurred in t5400-send-pack.sh, but there it was worked around in >>> 71406ed4d6 ("t5400: avoid concurrent writes into a trace file", >>> 2017-05-18). >>> >>> Fortunately, Windows does support atomic O_APPEND semantics using the >>> file access mode FILE_APPEND_DATA. Provide an implementation that does. >>> >>> This implementation is minimal in such a way that it only implements >>> the open modes that are actually used in the Git code base. Emulation >>> for other modes can be added as necessary later. To become aware of >>> the necessity early, the unusal error ENOSYS is reported if an >>> unsupported mode is encountered. >>> >>> Diagnosed-by: Johannes Schindelin <Johannes.Schindelin@xxxxxx> >>> Helped-by: Jeff Hostetler <git@xxxxxxxxxxxxxxxxx> >>> Signed-off-by: Johannes Sixt <j6t@xxxxxxxx> >>> --- >>> compat/mingw.c | 41 +++++++++++++++++++++++++++++++++++++++-- >>> 1 file changed, 39 insertions(+), 2 deletions(-) >> >> Nice. >> >> I wonder how much more expensive using this implementation is >> compared with the original "race susceptible" open(), when raciness >> is known not to be an issue (e.g. there is higher level lock that >> protects the appending). > > Certainly, the former way that uses two syscalls > (SetFilePointer+WriteFile) is more costly than this new way with just > one syscall (WriteFile). Of course, I don't know how atomic append > would be implemented in the kernel, but I can't think of a reason why > it should be slow on Windows, but fast on POSIX. > > (But I can't provide numbers to back up my gut feeling...) > > (And I also assume that you are not worried about the performance of > open() itself.) > >> ...[define race_safe_append_open]... and replace >> the call to open(... O_APPEND ...) in trace.c::get_trace_fd() with a >> call to that wrapper. That way, other codepaths that use O_APPEND >> (namely, reflog and todo-list writers) can avoid the additional >> cost, if any. >> >> Some may find it beneficial from code readability POV because that >> approach marks the codepath that needs to have non-racy fd more >> explicitly. > > O_APPEND is POSIX and means race-free append. If you mark some call > sites with O_APPEND, then that must be the ones that need race-free > append. Hence, you would have to go the other route: Mark those call > sites that do _not_ need race-free append with some custom > function/macro. (Or mark both with different helpers and avoid writing > down O_APPEND.) O_APPEND in POSIX is race-free only up to PIPE_MAX bytes written at a time, which is e.g. 2^12 by default on linux, after that all bets are off and the kernel is free to interleave different write calls. I've written code (not for git.git) that implements such a "write_non_racy" function in the past, and the first thing it needs to do is to assert that the length of the buffer being written doesn't exceed PIPE_MAX. So there's still a use for a race_safe_append_open() wrapper function, to O_APPEND and do the PIPE_MAX assertion. Otherwise you're calling a "safe" function which isn't safe at all anymore. I have no idea what the equivalent of that PIPE_MAX caveat is on non-POSIX (e.g. Windows), but would be interested to find out.