On Sat, Jul 10 2021, Sun Chao via GitGitGadget wrote: > From: Sun Chao <16657101987@xxxxxxx> > > Commit 33d4221c79 (write_sha1_file: freshen existing objects, > 2014-10-15) avoid writing existing objects by freshen their > mtime (especially the packfiles contains them) in order to > aid the correct caching, and some process like find_lru_pack > can make good decision. However, this is unfriendly to > incremental backup jobs or services rely on file system > cache when there are large '.pack' files exists. > > For example, after packed all objects, use 'write-tree' to > create same commit with the same tree and same environments > such like GIT_COMMITTER_DATE and GIT_AUTHOR_DATE, we can > notice the '.pack' file's mtime changed, but '.idx' file not. > > So if we update the mtime of packfile by updating the '.idx' > file instead of '.pack' file, when we check the mtime > of packfile, get it from '.idx' file instead. Large git > repository may contains large '.pack' files, but '.idx' > files are smaller enough, this can avoid file system cache > reload the large files again and speed up git commands. > > Signed-off-by: Sun Chao <16657101987@xxxxxxx> Does this have the unstated trade-off that in a mixed-version environment (say two git versions coordinating writes to an NFS share) where one is old and thinks *.pack needs updating, and the other is new and thinks *.idx is what should be checked, that until both are upgraded we're effectively back to pre-33d4221c79. I don't think it's a dealbreaker, just wondering if I've got that right & if it is's a trade-off you thought about, maybe we should check the mtime of both. The stat() is cheap, it's the re-sync that matters for you. But just to run with that thought, wouldn't it be even more helpful to you to have say a config setting to create a *.bump file next to the *.{idx,pack}. Then you'd have an empty file (the *.idx is smaller, but still not empty), and as a patch it seems relatively simple, i.e. some core.* or gc.* or pack.* setting changing what we touch/stat().