On Fri, Jan 03, 2025 at 02:40:58PM -0500, Jeff King wrote: > On Fri, Jan 03, 2025 at 09:19:55AM +0100, Patrick Steinhardt wrote: > > > In a preceding commit, we have adapted `check_collision()` to ignore > > If it's in next and we are building on top, I think we can mention it by > name: 0ad3d65652 (object-file: fix race in object collision check, > 2024-12-30). Okay, good to know. I wasn't sure whether the patches might get rewound when `next` gets rewound. > > the case where either of the colliding files vanishes. This should be > > safe in general when we assume that the contents of these two files were > > the same. But the check is all about detecting collisions, so that > > assumption may be too optimistic. > > I found this a little vague about what "too optimistic" means. ;) > Maybe something like: > > Prior to 0ad3d65652, callers could expect that a successful return > from finalize_object_file() means that either the file was moved into > place, or the identical bytes were already present. If neither of > those happens, we'd return an error. > > Since that commit, if the destination file disappears between our > link() call and the collision check, we'd return success without > actually checking the contents, and without retrying the link. This > solves the common case that the files were indeed the same, but it > means that we may corrupt the repository if they weren't (this implies > a hash collision, but the whole point of this function is protecting > against hash collisions). > > We can't be pessimistic and assume they're different; that hurts the > common case that 0ad3d65652 was trying to fix. But after seeing that > the destination file went away, we can retry linking again... Well, as usual your commit messages are something to aspire to :) Thanks. > > Furthermore, stop treating `ENOENT` specially for the source file. It > > shouldn't happen that the source vanishes as we're using a fresh > > temporary file for it, so if it does vanish it indicates an actual > > error. > > OK. I think this is worth doing, but I'd probably have put it into its > own commit. Fair enough, can do. > > @@ -2034,8 +2037,10 @@ int finalize_object_file(const char *tmpfile, const char *filename) > > int finalize_object_file_flags(const char *tmpfile, const char *filename, > > enum finalize_object_file_flags flags) > > { > > - struct stat st; > > - int ret = 0; > > + int ret; > > + > > +retry: > > + ret = 0; > > > > if (object_creation_mode == OBJECT_CREATION_USES_RENAMES) > > goto try_rename; > > @@ -2056,6 +2061,8 @@ int finalize_object_file_flags(const char *tmpfile, const char *filename, > > * left to unlink. > > */ > > if (ret && ret != EEXIST) { > > + struct stat st; > > + > > OK, we move the stat struct here where it's needed. I think that's > somewhat orthogonal to your patch, but reduced scoping does help make > the goto's less confusing. > > I suspect there's a way to write this as a loop that would be more > structured, but it would be a bigger refactor. Bonus points if it also > get rid of the try_rename goto, too. ;) > > I'm OK punting on that, though. Yeah, I'll punt on it for now. I don't love the resulting structure, but it's also not that uncommon in our codebase. > > @@ -2071,9 +2078,13 @@ int finalize_object_file_flags(const char *tmpfile, const char *filename, > > errno = saved_errno; > > return error_errno(_("unable to write file %s"), filename); > > } > > - if (!(flags & FOF_SKIP_COLLISION_CHECK) && > > - check_collision(tmpfile, filename)) > > + if (!(flags & FOF_SKIP_COLLISION_CHECK)) { > > + ret = check_collision(tmpfile, filename); > > + if (ret == CHECK_COLLISION_DEST_VANISHED) > > + goto retry; > > + else if (ret) > > return -1; > > + } > > unlink_or_warn(tmpfile); > > } > > I share Junio's uneasiness with looping forever based on external input > from the filesystem (even though you _should_ eventually win the race, > that's not guaranteed, and of course a weird filesystem might confuse > us). Could we put a stop-gap in it like: Fair enough. I was also wondering about whether or not I should have a retry counter when writing it but couldn't think about a (sane) scenario where it would be needed. But yeah, filesystems can be weird, and it's not a lot of code either, so I'll add it. Patrick