On Mon, Sep 16, 2024 at 11:54:14AM -0400, Taylor Blau wrote: > On Mon, Sep 16, 2024 at 12:45:38PM +0200, Patrick Steinhardt wrote: > > This function compares the exact contents, but isn't that wrong? The > > contents may differ even though the object is the same because the > > object hash is derived from the uncompressed data, whereas we store > > compressed data on disk. > > > > So this might trigger when you have different zlib versions, but also if > > you configure core.compression differently. > > Oh, shoot -- you're right for when we call finalize_object_file() on > loose objects. > > For packfiles and other spots where we use the hashfile API, the > name of the file depends on the checksum'd contents, so we're safe > there. But for loose objects the, the name of the file is based on the > object hash, which is the hash of the uncompressed contents. > > So I think we would need something like this on top: Thinking about this a little more, I think that most cases should actually be OK. Of course, this only affects repositories that are changing their zlib version, and/or changing their core.compression setting regularly, so this is all pretty niche, but still... We often guard calls to write_loose_object() with either calling freshen_loose_object(), or freshen_packed_object(), which will update the mtime of the containing pack or loose object path for a given hash. So if we already have a loose object with hash X in the object store, and we try to write that same object again with a different zlib version and/or core.compression setting, we'll simply call freshen_loose_object() and optimize out the actual object write. Of course, that's not always the case. We might e.g., force an object loose via loosen_packed_objects() which could encounter a TOCTOU race with an incoming object write using a different compression setting. But that seems exceedingly rare to me. I think that we should still take that change in the mail I'm replying to, but I would definitely appreciate others' thoughts here. Thanks, Taylor