Re: [PATCH v2 0/6] Implement a batched fsync option for core.fsyncObjectFiles

Neeraj Singh <nksingh85@xxxxxxxxx> · Wed, 8 Sep 2021 12:20:14 -0700

On Wed, Sep 8, 2021 at 12:12 PM Junio C Hamano <gitster@xxxxxxxxx> wrote:
>
> Neeraj Singh <nksingh85@xxxxxxxxx> writes:
>
> > On Tue, Sep 7, 2021 at 11:44 PM Junio C Hamano <gitster@xxxxxxxxx> wrote:
> >>
> >> Neeraj Singh <nksingh85@xxxxxxxxx> writes:
> >>
> >> > BTW, I updated the github PR to enable batch mode everywhere, and all
> >> > the tests passed, which is good news to me.
> >>
> >> I doubt that fsyncObjectFiles is something we can reliably test in
> >> CI, either with the new batched thing or with the original "when we
> >> close one, make sure the changes hit the disk platter" approach.  So
> >> I am not sure what conclusion we should draw from such an experiment,
> >> other than "ok, it compiles cleanly."  After all, unless we cause
> >> system crashes, what we thought we have written and close(2) would
> >> be seen by another process that we spawn after that, with or without
> >> sync, no?
> >
> > The main failure mode I was worried about is that some test or other part
> > of Git is relying on a loose object being immediately available after it is
> > added to the ODB. With batch mode, the loose objects aren't actually
> > available until the bulk checkin is unplugged.
>
> Ah, I see.  If there are two processes that communicate over pipes
> to decide whose turn it is (perhaps a producer of data that feeds
> fast-import may wait for fast-import to say "I gave this label to
> the object you requested" and goes ahead to use that object), and at
> the point that the "other" process takes its turn, if the objects
> are not "flushed" yet, things can break.  That's a valid concern.

That's right. This appears to be a possibility in the existing bulk
checkin code that produces packfiles for large objects as well, but
my change makes the situation much more common.