Re: [PATCH] mingw: enable atomic O_APPEND

Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> · Mon, 13 Aug 2018 23:22:10 +0200

On Mon, Aug 13 2018, Johannes Sixt wrote:

> Am 13.08.2018 um 22:20 schrieb Junio C Hamano:
>> Johannes Sixt <j6t@xxxxxxxx> writes:
>>
>>> The Windows CRT implements O_APPEND "manually": on write() calls, the
>>> file pointer is set to EOF before the data is written. Clearly, this is
>>> not atomic. And in fact, this is the root cause of failures observed in
>>> t5552-skipping-fetch-negotiator.sh and t5503-tagfollow.sh, where
>>> different processes write to the same trace file simultanously; it also
>>> occurred in t5400-send-pack.sh, but there it was worked around in
>>> 71406ed4d6 ("t5400: avoid concurrent writes into a trace file",
>>> 2017-05-18).
>>>
>>> Fortunately, Windows does support atomic O_APPEND semantics using the
>>> file access mode FILE_APPEND_DATA. Provide an implementation that does.
>>>
>>> This implementation is minimal in such a way that it only implements
>>> the open modes that are actually used in the Git code base. Emulation
>>> for other modes can be added as necessary later. To become aware of
>>> the necessity early, the unusal error ENOSYS is reported if an
>>> unsupported mode is encountered.
>>>
>>> Diagnosed-by: Johannes Schindelin <Johannes.Schindelin@xxxxxx>
>>> Helped-by: Jeff Hostetler <git@xxxxxxxxxxxxxxxxx>
>>> Signed-off-by: Johannes Sixt <j6t@xxxxxxxx>
>>> ---
>>>   compat/mingw.c | 41 +++++++++++++++++++++++++++++++++++++++--
>>>   1 file changed, 39 insertions(+), 2 deletions(-)
>>
>> Nice.
>>
>> I wonder how much more expensive using this implementation is
>> compared with the original "race susceptible" open(), when raciness
>> is known not to be an issue (e.g. there is higher level lock that
>> protects the appending).
>
> Certainly, the former way that uses two syscalls
> (SetFilePointer+WriteFile) is more costly than this new way with just
> one syscall (WriteFile). Of course, I don't know how atomic append
> would be implemented in the kernel, but I can't think of a reason why
> it should be slow on Windows, but fast on POSIX.
>
> (But I can't provide numbers to back up my gut feeling...)
>
> (And I also assume that you are not worried about the performance of
> open() itself.)
>
>> ...[define race_safe_append_open]... and replace
>> the call to open(... O_APPEND ...) in trace.c::get_trace_fd() with a
>> call to that wrapper.  That way, other codepaths that use O_APPEND
>> (namely, reflog and todo-list writers) can avoid the additional
>> cost, if any.
>>
>> Some may find it beneficial from code readability POV because that
>> approach marks the codepath that needs to have non-racy fd more
>> explicitly.
>
> O_APPEND is POSIX and means race-free append. If you mark some call
> sites with O_APPEND, then that must be the ones that need race-free
> append. Hence, you would have to go the other route: Mark those call
> sites that do _not_ need race-free append with some custom
> function/macro. (Or mark both with different helpers and avoid writing
> down O_APPEND.)

O_APPEND in POSIX is race-free only up to PIPE_MAX bytes written at a
time, which is e.g. 2^12 by default on linux, after that all bets are
off and the kernel is free to interleave different write calls.

I've written code (not for git.git) that implements such a
"write_non_racy" function in the past, and the first thing it needs to
do is to assert that the length of the buffer being written doesn't
exceed PIPE_MAX.

So there's still a use for a race_safe_append_open() wrapper function,
to O_APPEND and do the PIPE_MAX assertion. Otherwise you're calling a
"safe" function which isn't safe at all anymore.

I have no idea what the equivalent of that PIPE_MAX caveat is on
non-POSIX (e.g. Windows), but would be interested to find out.