Taylor Blau <me@xxxxxxxxxxxx> writes: > Note that this may cause some extra computation when the files are in > fact identical, but this should happen rarely. For example, when writing > a loose object, we compute the object id first, then check manually for > an existing object (so that we can freshen its timestamp) before > actually trying to write and link the new data. True. > +static int check_collision(const char *filename_a, const char *filename_b) > +{ > + char buf_a[4096], buf_b[4096]; > + int fd_a = -1, fd_b = -1; > + int ret = 0; > + > + fd_a = open(filename_a, O_RDONLY); > + if (fd_a < 0) { > + ret = error_errno(_("unable to open %s"), filename_a); > + goto out; > + } > + > + fd_b = open(filename_b, O_RDONLY); > + if (fd_b < 0) { > + ret = error_errno(_("unable to open %s"), filename_b); > + goto out; > + } Two and two half comments on this function. * We compare 4k at a time here, while copy.c copies 8k at a time, and bulk-checkin.c uses 16k at a time. Outside the scope of this topic, we probably should pick one number and stick to it, unless we have measured to pick perfect number for each case (and I know I picked 8k for copy.c and 16k for bulk-checkin.c both out of thin air). * I would have expected at least we would fstat() them to declare difference immediately after we find their sizes differ, for example. As we assume that calling into this function should be rare, we prefer not to pay in complexity for performance here? * We use read_in_full() and assume that a short-read return from the function happens only at the end of file due to EOF, which is another reason why we can do away without fstat() on these files. * An error causes the caller to assume collision (because we assign the return value of error() to ret), which should do the same action as an actual collision to abort and keep the problematic file for forensics. > /* > * Move the just written object into its final resting place. > */ > @@ -1941,8 +1992,8 @@ int finalize_object_file(const char *tmpfile, const char *filename) > errno = saved_errno; > return error_errno(_("unable to write file %s"), filename); > } > - /* FIXME!!! Collision check here ? */ > - unlink_or_warn(tmpfile); > + if (check_collision(tmpfile, filename)) > + return -1; > } OK.