On Wed, Sep 08 2021, Neeraj Singh wrote: > On Tue, Sep 7, 2021 at 11:44 PM Junio C Hamano <gitster@xxxxxxxxx> wrote: >> >> Neeraj Singh <nksingh85@xxxxxxxxx> writes: >> >> > BTW, I updated the github PR to enable batch mode everywhere, and all >> > the tests passed, which is good news to me. >> >> I doubt that fsyncObjectFiles is something we can reliably test in >> CI, either with the new batched thing or with the original "when we >> close one, make sure the changes hit the disk platter" approach. So >> I am not sure what conclusion we should draw from such an experiment, >> other than "ok, it compiles cleanly." After all, unless we cause >> system crashes, what we thought we have written and close(2) would >> be seen by another process that we spawn after that, with or without >> sync, no? > > The main failure mode I was worried about is that some test or other part > of Git is relying on a loose object being immediately available after it is > added to the ODB. With batch mode, the loose objects aren't actually > available until the bulk checkin is unplugged. > > I agree that it is not easy to test whether the data is actually going > to durable > storage at the expected time. FWIW, I did take a disk IO trace on Windows to > verify that we are issuing disk writes and flushes at the right time. > But that's a > one-time test that would be hard to make automated. I have some semi-related patches I need to dig up and finish sometime which add a "git gc" test mode to the test suite, i.e. any time we call "git gc --auto" it will go ahead and actually run, and some adversarial options to run always, right away, prune with --expire=now. It found some false positives, but also some genuine races and bugs at the time. Similarly, I think a good longer term goal for better fsync() and data integrity in git is to refactor the various codepaths where we write to disk (grepping for fsync_or_die() is a good start to find those) to all live in one place, we could then easily instrument that code to run in a hostile test mode. E.g. make anything that expects to write out a "foo" file actually write out "foo.not-synced-yet" as long as fsync() etc. hasn't been called, or with signals/timers/atexit() handlers fake up known FS edge cases such as a write of "foo" only renaming "foo.not-synced-yet" to "foo" 1s after the last close() call not followed by an fsync, etc. Anyway, I expect given your occupation that you may have better ideas in that area, presumably needing to instrument and test behavior under I/O pressure, deferred syncs etc. is something mature FS's need to deal with as part of their own regression tests... 1. https://lore.kernel.org/git/cover-v2-0.4-0000000000-20210908T003631Z-avarab@xxxxxxxxx/