> 2021年7月20日 14:19,Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> 写道: > > > On Mon, Jul 19 2021, Taylor Blau wrote: > >> On Mon, Jul 19, 2021 at 07:53:19PM +0000, Sun Chao via GitGitGadget wrote: >>> From: Sun Chao <16657101987@xxxxxxx> >>> >>> Commit 33d4221c79 (write_sha1_file: freshen existing objects, >>> 2014-10-15) avoid writing existing objects by freshen their >>> mtime (especially the packfiles contains them) in order to >>> aid the correct caching, and some process like find_lru_pack >>> can make good decision. However, this is unfriendly to >>> incremental backup jobs or services rely on cached file system >>> when there are large '.pack' files exists. >>> >>> For example, after packed all objects, use 'write-tree' to >>> create same commit with the same tree and same environments >>> such like GIT_COMMITTER_DATE and GIT_AUTHOR_DATE, we can >>> notice the '.pack' file's mtime changed. Git servers >>> that mount the same NFS disk will re-sync the '.pack' files >>> to cached file system which will slow the git commands. >>> >>> So if add core.freshenPackfiles to indicate whether or not >>> packs can be freshened, turning off this option on some >>> servers can speed up the execution of some commands on servers >>> which use NFS disk instead of local disk. >> >> Hmm. I'm still quite unconvinced that we should be taking this direction >> without better motivation. We talked about your assumption that NFS >> seems to be invalidating the block cache when updating the inodes that >> point at those blocks, but I don't recall seeing further evidence. > > I don't know about Sun's setup, but what he's describing is consistent > with how NFS works, or can commonly be made to work. > > See e.g. "lookupcache" in nfs(5) on Linux, but also a lot of people use > some random vendor's proprietary NFS implementation, and commonly tweak > various options that make it anywhere between "I guess that's not too > crazy" and "are you kidding me?" levels of non-POSIX compliant. > >> Regardless, a couple of idle thoughts: >> >>> + if (!core_freshen_packfiles) >>> + return 1; >> >> It is important to still freshen the object mtimes even when we cannot >> update the pack mtimes. That's why we return 0 when "freshen_file" >> returned 0: even if there was an error calling utime, we should still >> freshen the object. This is important because it impacts when >> unreachable objects are pruned. >> >> So I would have assumed that if a user set "core.freshenPackfiles=false" >> that they would still want their object mtimes updated, in which case >> the only option we have is to write those objects out loose. >> >> ...and that happens by the caller of freshen_packed_object (either >> write_object_file() or hash_object_file_literally()) then calling >> write_loose_object() if freshen_packed_object() failed. So I would have >> expected to see a "return 0" in the case that packfile freshening was >> disabled. >> >> But that leads us to an interesting problem: how many redundant objects >> do we expect to see on the server? It may be a lot, in which case you >> may end up having the same IO problems for a different reason. Peff >> mentioned to me off-list that he suspected write-tree was overeager in >> how many trees it would try to write out. I'm not sure. > > In my experience with NFS the thing that kills you is anything that > needs to do iterations, i.e. recursive readdir() and the like, or to > read a lot of data, throughput was excellent. It's why I hacked core > that core.checkCollisions patch. I have read your patch, looks like a good idea to reduce the expensive operations like readdir(). And in my production environments, IOPS stress of the NFS server bothers me which make the git commands slow. > > Jeff improved the situation I was mainly trying to fix with with the > loose objects cache. I never got around to benchmarking the two in > production, and now that setup is at an ex-job... > >>> +test_expect_success 'do not bother loosening old objects without freshen pack time' ' >>> + obj1=$(echo three | git hash-object -w --stdin) && >>> + obj2=$(echo four | git hash-object -w --stdin) && >>> + pack1=$(echo $obj1 | git -c core.freshenPackFiles=false pack-objects .git/objects/pack/pack) && >>> + pack2=$(echo $obj2 | git -c core.freshenPackFiles=false pack-objects .git/objects/pack/pack) && >>> + git -c core.freshenPackFiles=false prune-packed && >>> + git cat-file -p $obj1 && >>> + git cat-file -p $obj2 && >>> + test-tool chmtime =-86400 .git/objects/pack/pack-$pack2.pack && >>> + git -c core.freshenPackFiles=false repack -A -d --unpack-unreachable=1.hour.ago && >>> + git cat-file -p $obj1 && >>> + test_must_fail git cat-file -p $obj2 >>> +' >> >> I had a little bit of a hard time following this test. AFAICT, it >> proceeds as follows: >> >> - Write two packs, each containing a unique unreachable blob. >> - Call 'git prune-packed' with packfile freshening disabled, then >> check that the object survived. >> - Then repack while in a state where one of the pack's contents would >> be pruned. >> - Make sure that one object survives and the other doesn't. >> >> This doesn't really seem to be testing the behavior of disabling >> packfile freshening so much as it's testing prune-packed, and repack's >> `--unpack-unreachable` option. I would probably have expected to see >> something more along the lines of: >> >> - Write an unreachable object, pack it, and then remove the loose copy >> you wrote in the first place. >> - Then roll the pack's mtime to some fixed value in the past. >> - Try to write the same object again with packfile freshening >> disabled, and verify that: >> - the pack's mtime was unchanged, >> - the object exists loose again >> >> But I'd really like to get some other opinions (especially from Peff, >> who brought up the potential concerns with write-tree) as to whether or >> not this is a direction worth pursuing. >> >> My opinion is that it is not, and that the bizarre caching behavior you >> are seeing is out of Git's control. > > Thanks for this, I found the test hard to follow too, but didn't have > time to really think about it, this makes sense. > > Back to the topic: I share your sentiment of trying to avoid complexity > in this area. > > Sun: Have you considered --keep-unreachable to "git repack"? It's > orthagonal to what you're trying here, but I wonder if being more > aggressive about keeping objects + some impromevents to perhaps skip > this "touch" at all if we have that in effect wouldn't be more viable & > something e.g. Taylor would be more comforable having part of git.git. Yes, I will try to create some more useful test cases with both `--keep-unreachable` and `--unpack-unreachable` if I still believe I need to do something with the mtime, whatever I need to get my test reports first, thanks ;)