Re: fsmonitor: t7527 racy on OSX?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 11/22/22 5:12 PM, Ævar Arnfjörð Bjarmason wrote:

On Tue, Nov 22 2022, Eric DeCosta wrote:

-----Original Message-----
From: Đoàn Trần Công Danh <congdanhqx@xxxxxxxxx>
Sent: Monday, November 21, 2022 8:39 AM
To: Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx>
Cc: Git ML <git@xxxxxxxxxxxxxxx>; Eric DeCosta
<edecosta@xxxxxxxxxxxxx>; Jeff Hostetler <jeffhost@xxxxxxxxxxxxx>
Subject: Re: fsmonitor: t7527 racy on OSX?

On 2022-11-21 14:07:13+0100, Ævar Arnfjörð Bjarmason
<avarab@xxxxxxxxx> wrote:
I have access to a Mac OS X M1 box (gcc104 at [1]) where t7527
reliably fails due to what seems to be a race us doing something, and
assuming that fsmonitor picked up on it.

See also https://lore.kernel.org/git/YvZbGAf+82WtNXcJ@xxxxxxxx/
<https://protect-
us.mimecast.com/s/580RCpYn6ETDOBoycYVkUq?domain=lore.kernel.org>

I raised 3 months ago and it seems like Jeff Hostetler is too busy.


This makes the tests pass:

diff --git a/t/t7527-builtin-fsmonitor.sh
b/t/t7527-builtin-fsmonitor.sh index 56c0dfffea..ce2555d558 100755
--- a/t/t7527-builtin-fsmonitor.sh
+++ b/t/t7527-builtin-fsmonitor.sh
@@ -428,6 +428,7 @@ test_expect_success 'edit some files' '
start_daemon --tf "$PWD/.git/trace" &&

edit_files &&
+ sleep 1 &&

test-tool fsmonitor-client query --token 0 &&

@@ -443,6 +444,7 @@ test_expect_success 'create some files' '
start_daemon --tf "$PWD/.git/trace" &&

create_files &&
+ sleep 1 &&

test-tool fsmonitor-client query --token 0 &&

@@ -471,6 +473,7 @@ test_expect_success 'rename some files' '
start_daemon --tf "$PWD/.git/trace" &&

rename_files &&
+ sleep 1 &&

test-tool fsmonitor-client query --token 0 &&

@@ -978,6 +981,7 @@ test_expect_success
!UNICODE_COMPOSITION_SENSITIVE 'Unicode nfc/nfd' '
mkdir test_unicode/nfd/d_${utf8_nfd} &&

git -C test_unicode fsmonitor--daemon stop &&
+ sleep 1 &&

if test_have_prereq UNICODE_NFC_PRESERVED then

The failure is when we grep out the events we expect, which aren't
there, but if you manually inspect them they're there. I.e. they're
just not "in" yet.

I thought this might be a lack of flushing or syncing in our own trace
code, but adding an fsync() to trace_write() didn't do the trick.

1. https://cfarm.tetaneutral.net/news/41#
<https://protect-
us.mimecast.com/s/S6YNCqxoXGIWkoNRHEfMzu?domain=cfarm
.tetaneutral.net>

--
Danh

Honestly, I'm not surprised. Stopping the daemon and grepping for
expected results immediately there after is just asking for these
sorts of races. Sleeping is a bit ugly, but without an explicit means
of synchronization is probably the best that can be done. I can take a
look at it some more as I have access to M1 Macs.

I don't see why it would have to do with stopping the daemon. If
anything that should reduce the odds that you're running into a
race. I.e. on OSX in general this will work:

	echo foo >f &&
	grep foo f

Or, the equivalent with an "echo" that's not a shell built-in. I.e. we
had a process start, print to a file, and then we grep data out of it
agin.

The reason I'm saying it should reduce them is if the "echo" were some
long-running daemon process that was still running the "grep" might fail
because the "foo" was still in some buffer and hadn't been written or
fsync'd to disk.

Anyway, all of that seems inapplicable to these failures, as we're not
stopping the daemon yet by the time we run into the synchronization
problem. We just *started* it, then renamed some files, but when we ask
for those events we don't get them back.

Maybe there's some innocuous reason for that, but I have the sinking
feeling that it might be some race between creating the files, the
kernel getting those events, acting on them, but not having sent notice
of those events to the daemon that's listening.

*That* would be much scarier, and would mean that this fsmonitor
implementation would be racy outside of our tests, wouldn't it?

I managed to reliably reproduce this on my new M1 mac (and while
working on replacing the call to the deprecated FSEvents routine
mentioned in another thread).

I should have a fix for this/them shortly.

Thanks for your patience.
Jeff




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux