Re: [PATCH v3] packfile: freshen the mtime of packfile by configuration

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Jul 19, 2021 at 07:53:19PM +0000, Sun Chao via GitGitGadget wrote:
> From: Sun Chao <16657101987@xxxxxxx>
>
> Commit 33d4221c79 (write_sha1_file: freshen existing objects,
> 2014-10-15) avoid writing existing objects by freshen their
> mtime (especially the packfiles contains them) in order to
> aid the correct caching, and some process like find_lru_pack
> can make good decision. However, this is unfriendly to
> incremental backup jobs or services rely on cached file system
> when there are large '.pack' files exists.
>
> For example, after packed all objects, use 'write-tree' to
> create same commit with the same tree and same environments
> such like GIT_COMMITTER_DATE and GIT_AUTHOR_DATE, we can
> notice the '.pack' file's mtime changed. Git servers
> that mount the same NFS disk will re-sync the '.pack' files
> to cached file system which will slow the git commands.
>
> So if add core.freshenPackfiles to indicate whether or not
> packs can be freshened, turning off this option on some
> servers can speed up the execution of some commands on servers
> which use NFS disk instead of local disk.

Hmm. I'm still quite unconvinced that we should be taking this direction
without better motivation. We talked about your assumption that NFS
seems to be invalidating the block cache when updating the inodes that
point at those blocks, but I don't recall seeing further evidence.

Regardless, a couple of idle thoughts:

> +	if (!core_freshen_packfiles)
> +		return 1;

It is important to still freshen the object mtimes even when we cannot
update the pack mtimes. That's why we return 0 when "freshen_file"
returned 0: even if there was an error calling utime, we should still
freshen the object. This is important because it impacts when
unreachable objects are pruned.

So I would have assumed that if a user set "core.freshenPackfiles=false"
that they would still want their object mtimes updated, in which case
the only option we have is to write those objects out loose.

...and that happens by the caller of freshen_packed_object (either
write_object_file() or hash_object_file_literally()) then calling
write_loose_object() if freshen_packed_object() failed. So I would have
expected to see a "return 0" in the case that packfile freshening was
disabled.

But that leads us to an interesting problem: how many redundant objects
do we expect to see on the server? It may be a lot, in which case you
may end up having the same IO problems for a different reason. Peff
mentioned to me off-list that he suspected write-tree was overeager in
how many trees it would try to write out. I'm not sure.

> +test_expect_success 'do not bother loosening old objects without freshen pack time' '
> +	obj1=$(echo three | git hash-object -w --stdin) &&
> +	obj2=$(echo four | git hash-object -w --stdin) &&
> +	pack1=$(echo $obj1 | git -c core.freshenPackFiles=false pack-objects .git/objects/pack/pack) &&
> +	pack2=$(echo $obj2 | git -c core.freshenPackFiles=false pack-objects .git/objects/pack/pack) &&
> +	git -c core.freshenPackFiles=false prune-packed &&
> +	git cat-file -p $obj1 &&
> +	git cat-file -p $obj2 &&
> +	test-tool chmtime =-86400 .git/objects/pack/pack-$pack2.pack &&
> +	git -c core.freshenPackFiles=false repack -A -d --unpack-unreachable=1.hour.ago &&
> +	git cat-file -p $obj1 &&
> +	test_must_fail git cat-file -p $obj2
> +'

I had a little bit of a hard time following this test. AFAICT, it
proceeds as follows:

  - Write two packs, each containing a unique unreachable blob.
  - Call 'git prune-packed' with packfile freshening disabled, then
    check that the object survived.
  - Then repack while in a state where one of the pack's contents would
    be pruned.
  - Make sure that one object survives and the other doesn't.

This doesn't really seem to be testing the behavior of disabling
packfile freshening so much as it's testing prune-packed, and repack's
`--unpack-unreachable` option. I would probably have expected to see
something more along the lines of:

  - Write an unreachable object, pack it, and then remove the loose copy
    you wrote in the first place.
  - Then roll the pack's mtime to some fixed value in the past.
  - Try to write the same object again with packfile freshening
    disabled, and verify that:
    - the pack's mtime was unchanged,
    - the object exists loose again

But I'd really like to get some other opinions (especially from Peff,
who brought up the potential concerns with write-tree) as to whether or
not this is a direction worth pursuing.

My opinion is that it is not, and that the bizarre caching behavior you
are seeing is out of Git's control.

Thanks,
Taylor



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux