Re: [RFC PATCH 2/2] core.fsyncObjectFiles: make the docs less flippant

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Oct 08 2020, Johannes Schindelin wrote:

> Hi Junio and Ævar,
>
> On Thu, 17 Sep 2020, Junio C Hamano wrote:
>
>> Ævar Arnfjörð Bjarmason  <avarab@xxxxxxxxx> writes:
>>
>> > As amusing as Linus's original prose[1] is here it doesn't really explain
>> > in any detail to the uninitiated why you would or wouldn't enable
>> > this, and the counter-intuitive reason for why git wouldn't fsync your
>> > precious data.
>> >
>> > So elaborate (a lot) on why this may or may not be needed. This is my
>> > best-effort attempt to summarize the various points raised in the last
>> > ML[2] discussion about this.
>> >
>> > 1.  aafe9fbaf4 ("Add config option to enable 'fsync()' of object
>> >     files", 2008-06-18)
>> > 2. https://lore.kernel.org/git/20180117184828.31816-1-hch@xxxxxx/
>> >
>> > Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx>
>> > ---
>> >  Documentation/config/core.txt | 42 ++++++++++++++++++++++++++++++-----
>> >  1 file changed, 36 insertions(+), 6 deletions(-)
>>
>> When I saw the subject in my mailbox, I expected to see that you
>> would resurrect Christoph's updated text in [*1*], but you wrote a
>> whole lot more ;-) And they are quite informative to help readers to
>> understand what the option does.  I am not sure if the understanding
>> directly help readers to decide if it is appropriate for their own
>> repositories, though X-<.
>
> I agree that it is an improvement, and am therefore in favor of applying
> the patch.

Just the improved docs, or flipping the default of core.fsyncObjectFiles
to "true"?

I've been meaning to re-roll this. I won't have time anytime soon to fix
git's fsync() use, i.e. ensure that we run up & down modified
directories and fsync()/fdatasync() file/dir fd's as appropriate but I
think documenting it and changing the core.fsyncObjectFiles default
makes sense and is at least a step in the right direction.

I do think it makes more sense for a v2 to split most of this out into
some section that generally discusses data integrity in the .git
directory. I.e. that says that currently where we use fsync() (such as
pack/commit-graph writes) we don't fsync() the corresponding
director{y,ies), and ref updates don't fsync() at all.

Where to put that though? gitrepository-layout(5)? Or a new page like
gitrepository-integrity(5) (other suggestions welcome..).

Looking at the code again it seems easier than I thought to make this
right if we ignore .git/refs (which reftable can fix for us). Just:

1. Change fsync_or_die() and its callsites to also pass/sync the
   containing directory, which is always created already
   (e.g. .git/objects/pack/)...).

2. ..Or in the case where it's not created already such as
   .git/objects/??/ (or .git/objects/pack/) itself) it's not N-deep like
   the refs hierarchy, so "did we create it" state is pretty simple, or
   we can just always do it unconditionally.

3. Without reftable the .git/refs/ case shouldn't be too hard if we're
   OK with redundantly fsyncing all the way down, i.e. to make it
   simpler by not tracking the state of exactly what was changed.

4. Now that I'm writing this there's also .git/{config,rr-cache} and any
   number of other things we need to change for 100% coverage, but the
   above 1-3 should be good enough for repo integrity where repo = refs
   & objects.




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux