On Wed, Jan 01, 2025 at 01:19:52PM -0500, Jeff King wrote: > On Wed, Jan 01, 2025 at 08:50:51AM -0800, Junio C Hamano wrote: > > > Jeff King <peff@xxxxxxxx> writes: > > > > > There is one gotcha here, though. We call this collision check only if > > > we got EEXIST trying to move the tempfile into place. If the destination > > > file then goes away, we can't do the collision check. But is it right to > > > quietly return success? > > > > > > If the contents of the two were the same, that's fine. We don't need the > > > extra copy. > > > > > > But if the contents were not the same, we'd prefer either to actually > > > copy the contents into place, or to return an error. > > > > > > Of course we can't know, because the destination file has gone away. In > > > the common case they will be the same, but the whole point of this check > > > is to allow loosening the cryptographic collision of the packfile > > > contents. So the safest thing would be to retain the tempfile, copying > > > it into the destination file. That errs on the side of keeping data when > > > we cannot make a determination. > > > > > > IOW, if we see ENOENT on filename_b, should we then loop back in the > > > caller to try the link() again? > > > > Yuck, I think you're absolutely right. > > So I think this part, if adjusted as I suggested, would fix the race in > the tests without making anything worse (it's just more code). Hm, okay, I think that makes sense. Instead of being optimistic like my first version was we should be pessimistic and assume the worst. Which I guess is a sensible stance to take in a collision check. > And then this... > > > > I think check_collision() is used _after_ the attempt to rename() into > > > place. So there's a race when the tempfile goes away, but I think the > > > outcome is made a bit worse by your patch. > > > > > > Consider a sequence like this: > > > > > > a. Process A writes tmp_pack_foo. > > > > > > b. Process A tries to link tmp_pack_foo to pack-<hash> but finds it > > > already exists. > > > > > > c. Process A opens both tmp_pack_foo and pack-<hash>. > > > > > > d. Process A compares the two byte-for-byte, and then returns > > > success/failure based on whether they were actually identical. > > > > > > Now imagine there is a process B that deletes the file (maybe an > > > over-zealous "gc --prune=now" deletes the in-use temporary file): > > > > > > - if process B deletes it between steps (a) and (b), process A returns > > > an error (there is nothing to link). The caller knows that the data > > > was not stored. > > > > > > - if process B deletes it between (b) and (c), then before your patch > > > we see an error (because we can't compare the files). After your > > > patch, we continue on and return success. The caller knows the data > > > was stored (via the original file, not our new copy). > > > > > > - if process B deletes it between (c) and (d), then process A has no > > > idea. But at this point it does not matter. If the files were > > > identical, we return success (and in fact, process A deletes the file > > > itself). And if not identical, then we return error, and the callers > > > knows the data was not stored. > > > > > > So even though the exact behavior may depend on where we hit the race, I > > > think ignoring an ENOENT open() error on the tempfile meaningfully > > > changes what happens in the middle case. > > > > > > In practice I don't really expect this to happen, and "gc --prune=now" > > > is inherently risky in a live repository. But I think we're probably > > > better off to continue treating it as an error if we can't open our own > > > tempfile. > > > > So we'd ignore the racy and flaky tests, as hiding the flake by > > ignoring the error would only hurt the real world users. > > ...is all about ignoring ENOENT on the tmpfile itself. And I think we > can just drop that hunk entirely. The tests do not care here (they are > running simultaneous gc, but _not_ a simultaneous "--prune=now" gc). Yeah, we don't need this hunk then. I'll send a follow-up change. Patrick