Re: [PATCH v4 2/4] core.fsync: introduce granular fsync control

Neeraj Singh <nksingh85@xxxxxxxxx> · Fri, 11 Feb 2022 12:38:02 -0800

Apologies in advance for the delayed reply.  I've finally been able to
return to Git after an absence.

On Tue, Feb 1, 2022 at 4:51 PM Junio C Hamano <gitster@xxxxxxxxx> wrote:
>
> "Neeraj Singh via GitGitGadget" <gitgitgadget@xxxxxxxxx> writes:
>
> > +core.fsync::
> > +     A comma-separated list of parts of the repository which should be
> > +     hardened via the core.fsyncMethod when created or modified. You can
> > +     disable hardening of any component by prefixing it with a '-'. Later
> > +     items take precedence over earlier ones in the list. For example,
> > +     `core.fsync=all,-pack-metadata` means "harden everything except pack
> > +     metadata." Items that are not hardened may be lost in the event of an
> > +     unclean system shutdown.
> > ++
> > +* `none` disables fsync completely. This must be specified alone.
> > +* `loose-object` hardens objects added to the repo in loose-object form.
> > +* `pack` hardens objects added to the repo in packfile form.
> > +* `pack-metadata` hardens packfile bitmaps and indexes.
> > +* `commit-graph` hardens the commit graph file.
> > +* `objects` is an aggregate option that includes `loose-objects`, `pack`,
> > +  `pack-metadata`, and `commit-graph`.
> > +* `default` is an aggregate option that is equivalent to `objects,-loose-object`
> > +* `all` is an aggregate option that syncs all individual components above.
>
> I am not quite sure if this is way too complex (e.g. what does it
> mean that we do not care much about loose-object safety while we do
> care about commit-graph files?) and at the same time it is too
> limited (e.g. if it makes sense to say a class of items deserve more
> protection than another class of items, don't we want to be able to
> say "class X is ultra-precious so use method A on them, while class
> Y is mildly precious and use method B on them, everything else are
> not that important and doing the default thing is just fine").
>
> If we wanted to allow the "matrix" kind of flexibility, I think the
> way to do so would be
>
>         fsync.<class>.method = <value>
>
> e.g.
>
>         [fsync "default"] method = none
>         [fsync "loose-object"] method = fsync
>         [fsync "pack-metadata"] method = writeout-only
>

I don't believe it makes sense to offer a full matrix of what to fsync
and what method to use, since the method is a property of the
filesystem and OS the repo is running on, while the list of things to
fsync is more a selection of what the user values. So if I'm hosting
on APFS on macOS or NTFS on Windows, I'd want to set the fsyncMethod
to batch so that I can get good performance at the safety level I
choose.  If I'm working on my maintainer repo, I'd maybe not want to
fsync anything, but I'd want to fsync everything when working on my
developer repo.

> Where do we expect users to take the core.fsync settings from?  Per
> repository?  If it is from per user (i.e. $HOME/.gitconfig), do
> people tend to share it across systems (not necessarily over NFS)
> with the same contents?  If so, I am not sure if fsync.method that
> is way too close to the actual "implementation" is a good idea to
> begin with.  From end-user's point of view, it may be easier to
> express "class X is ultra-precious, and class Y and Z are mildly
> so", with something like fsync.<class>.level = <how-precious> and
> let the Git implementation on each platform choose the appropriate
> fsync method to protect the stuff at that precious-ness.
>

I expect the vast majority of users to have whatever setting is baked
into their build of Git.  For the users that want to do something
different, I expect them to have core.fsyncMethod and core.fsync
configured per-user for the majority of their repos. Some repos might
have custom settings that override the per-user settings: 1) Ephemeral
repos that don't contain unique data would probably want to set
core.fsync=none. 2) Repos hosting on NFS or on a different FS may have
a stricter core.fsyncmethod setting.

(More more text to follow in reply to your next email).