On Thu, Sep 17, 2020 at 09:16:05AM -0400, Jeff King wrote: > I've also often wondered whether this is necessary. Given the symptom of > "oops, this object is there but with 0 bytes" after a hard crash (power > off, etc), my assumption is that the metadata is being journaled but the > actual data is not. Which would imply this isn't needed, but may just be > revealing my naive view of how filesystems work. > > And of course all of my experience is on ext4 (which doubly confuses me, > because my systems typically have data=ordered, which I thought would > solve this). Non-journalling filesystems or other modes likely behave > differently, but if this extra fsync carries a cost, we may want to make > it optional. I hope my other mail clarified how this works at a high level, if not feel free to ask more questions. > > sha1-file.c | 19 ++++++++++++++----- > > 1 file changed, 14 insertions(+), 5 deletions(-) > > We already fsync pack files, but we don't fsync their directories. If > this is important to do, we should be doing it there, too. > > We also don't fsync ref files (nor packed-refs) at all. If fsyncing > files is important for reliability, we should be including those, too. > It may be tempting to say that the important stuff is in objects and the > refs can be salvaged from the commit graph, but my experience says > otherwise. Missing, broken, or mysteriously-rewound refs cause confusing > user-visible behavior, and when compounded with pruning operations like > "git gc" they _do_ result in losing objects. True, this probably needs to do for the directories of other files as well. One interesting optimization under linux is the syncfs syscall, that syncs all files on a file system - if you need to do a large number of fsyncs that do not depend on each other for transaction semantics it can provide a huge speedup.