Re: [PATCH 2/2] object-file: retry linking file into place when occluding file vanishes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Jan 03, 2025 at 02:40:58PM -0500, Jeff King wrote:
> On Fri, Jan 03, 2025 at 09:19:55AM +0100, Patrick Steinhardt wrote:
> 
> > In a preceding commit, we have adapted `check_collision()` to ignore
> 
> If it's in next and we are building on top, I think we can mention it by
> name: 0ad3d65652 (object-file: fix race in object collision check,
> 2024-12-30).

Okay, good to know. I wasn't sure whether the patches might get rewound
when `next` gets rewound.

> > the case where either of the colliding files vanishes. This should be
> > safe in general when we assume that the contents of these two files were
> > the same. But the check is all about detecting collisions, so that
> > assumption may be too optimistic.
> 
> I found this a little vague about what "too optimistic" means. ;)
> Maybe something like:
> 
>   Prior to 0ad3d65652, callers could expect that a successful return
>   from finalize_object_file() means that either the file was moved into
>   place, or the identical bytes were already present. If neither of
>   those happens, we'd return an error.
> 
>   Since that commit, if the destination file disappears between our
>   link() call and the collision check, we'd return success without
>   actually checking the contents, and without retrying the link. This
>   solves the common case that the files were indeed the same, but it
>   means that we may corrupt the repository if they weren't (this implies
>   a hash collision, but the whole point of this function is protecting
>   against hash collisions).
> 
>   We can't be pessimistic and assume they're different; that hurts the
>   common case that 0ad3d65652 was trying to fix. But after seeing that
>   the destination file went away, we can retry linking again...

Well, as usual your commit messages are something to aspire to :)
Thanks.

> > Furthermore, stop treating `ENOENT` specially for the source file. It
> > shouldn't happen that the source vanishes as we're using a fresh
> > temporary file for it, so if it does vanish it indicates an actual
> > error.
> 
> OK. I think this is worth doing, but I'd probably have put it into its
> own commit.

Fair enough, can do.

> > @@ -2034,8 +2037,10 @@ int finalize_object_file(const char *tmpfile, const char *filename)
> >  int finalize_object_file_flags(const char *tmpfile, const char *filename,
> >  			       enum finalize_object_file_flags flags)
> >  {
> > -	struct stat st;
> > -	int ret = 0;
> > +	int ret;
> > +
> > +retry:
> > +	ret = 0;
> >  
> >  	if (object_creation_mode == OBJECT_CREATION_USES_RENAMES)
> >  		goto try_rename;
> > @@ -2056,6 +2061,8 @@ int finalize_object_file_flags(const char *tmpfile, const char *filename,
> >  	 * left to unlink.
> >  	 */
> >  	if (ret && ret != EEXIST) {
> > +		struct stat st;
> > +
> 
> OK, we move the stat struct here where it's needed. I think that's
> somewhat orthogonal to your patch, but reduced scoping does help make
> the goto's less confusing.
> 
> I suspect there's a way to write this as a loop that would be more
> structured, but it would be a bigger refactor. Bonus points if it also
> get rid of the try_rename goto, too. ;)
> 
> I'm OK punting on that, though.

Yeah, I'll punt on it for now. I don't love the resulting structure, but
it's also not that uncommon in our codebase.

> > @@ -2071,9 +2078,13 @@ int finalize_object_file_flags(const char *tmpfile, const char *filename,
> >  			errno = saved_errno;
> >  			return error_errno(_("unable to write file %s"), filename);
> >  		}
> > -		if (!(flags & FOF_SKIP_COLLISION_CHECK) &&
> > -		    check_collision(tmpfile, filename))
> > +		if (!(flags & FOF_SKIP_COLLISION_CHECK)) {
> > +			ret = check_collision(tmpfile, filename);
> > +			if (ret == CHECK_COLLISION_DEST_VANISHED)
> > +				goto retry;
> > +			else if (ret)
> >  				return -1;
> > +		}
> >  		unlink_or_warn(tmpfile);
> >  	}
> 
> I share Junio's uneasiness with looping forever based on external input
> from the filesystem (even though you _should_ eventually win the race,
> that's not guaranteed, and of course a weird filesystem might confuse
> us). Could we put a stop-gap in it like:

Fair enough. I was also wondering about whether or not I should have a
retry counter when writing it but couldn't think about a (sane) scenario
where it would be needed. But yeah, filesystems can be weird, and it's
not a lot of code either, so I'll add it.

Patrick




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux