Re: RFC: A configuration design for future-proofing fsync() configuration

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Neeraj Singh <nksingh85@xxxxxxxxx> writes:

> After sleeping on it for a while, I'm willing to consolidate the
> configuration along the lines that you've specified, but I'd like to
> reduce the number of degrees of freedom.
>
> My proposal in Documentation form:
>
> core.fsync::
> A comma-separated list of parts of the repository which should be hardened by
> calling fsync when created or modified. When an aggregate option is
> specified, a subcomponent can be overriden by prefixing it with a '-'. For
> example, `core.fsync=all,-index` means "fsync everything except the index".
> Items which are not fsync'ed may be lost in the even of an unclean system
> shutdown. This setting defaults to `objects,-loose-objects`
> +
> * `loose-objects` hardens objects added to the repo in loose-object form.
> * `packs` hardens objects added to the repo in packfile form and the related
>   bitmap and index files.
> * `commit-graph` hardens the commit graph file.
> * `refs` (future) hardens references when they are modified.
> * `index` (future) hardens the index when it is modified.
> * `objects` is an aggregate option that includes `loose-objects`, `packs`, and
>   `commit-graph`.
> * `all` is an aggregate option that syncs all individual components above.
> * `none` is an aggregate option that disables fsync completely.

I wasn't closely following the discussion at all, but the above
simplification may still even be too fine-grained?  For example,
what does it mean to care less about the robustness of loose objects
than packs or ref updates?  How does an existing fine-grained
classification interact with new classes of filesystem entity we
will introduce under .git in the future?  Imagine that we didn't
have .midx and multi-pack bitmap yet; since 'loose-objects',
'packs', and 'commit-graph' are the only three groups we can choose
to place any "objects and reachability" related data in, we need to
pick one, and choosing 'packs' class may be the choice of least
resistance, the default kitchen-sync category for anything related
to "object".  Or just like 'commit-graph' has its own category,
would we invent a new class and call it 'multi-pack'?

I cannot shake the feeling that these are making everything
unnecessarily complex and adding more things that we need to explain
to the end-user---and the worst part is I doubt it would help the
end-users very much tot understand what gets explained.

> core.fsyncMethod::
> A value indicating the strategy Git will use to harden repository data using
> fsync and related primitives.
> +
> * 'default' uses the fsync(2) system call or platform equivalents.
> * 'batch' uses APIs such as sync_file_range or equivalent to reduce the number
>   of hardware FLUSH CACHE requests sent to the storage hardware.
> * 'writeout-only' (future) issues requests to send the writes to the storage
> * hardware, but does not send any FLUSH CACHE request.
> * 'syncfs' (future) uses the syncfs API, where available, to sync all of the
>   files on the same filesystem as the Git repo.

How would an end-user choose among these?  If they assume that the
version of Git they use is bug-free, is there a reason why they
should ever pick 'default' over 'batch', for example?  Shouldn't we
be the one to choose the best approach on the underlying filesystem
for the users, instead of forcing them to choose?

As implementors, these choices may be of interest and give you a
handy way to compare different design, but I am not sure if we want
to give anything more complex than a binary choice, "default" and
"eatmydata".

> core.fsyncObjectFiles::
> If `true`, this legacy setting is equivalent to `core.fsync=objects`. If
> `core.fsync` is explicitly specified, then this setting is ignored.

I think deprecating this very-specific knob is a good idea,
regardless of how complex we'd want to make the alternative.

Thanks.



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux