Re: 'git add' corrupts repository if the working directory is modified as it runs

Ilari Liusvaara <ilari.liusvaara@xxxxxxxxxxx> · Sat, 13 Feb 2010 15:39:52 +0200

On Sat, Feb 13, 2010 at 06:12:38AM -0600, Jonathan Nieder wrote:
> 
> This leaves me nervous about speed.  Consider the following simple
> case: someone the file to be added is already in the object
> repository somewhere (maybe the user has tried this code before, or
> a file was renamed with 'mv', or a patch applied with 'patch', or an
> unmount and remount dirtied the stat information).
> 
> With the current code, write_sha1_file() will hash the file, notice
> that object is already in .git/objects, and return.  With a
> read-hash-copy loop, git would have to store a (compressed or
> uncompressed) copy of the file somewhere in the meantime.

It could be done by first reading the file and computing hash,
if the hash matches existing object, return that hash. Otherwise
read the file for object write, hashing it again and use that value
for object ID.

This would require two hash computations in non-existing case,
but SHA-1 is pretty fast. If the first computation produces match,
then it doesn't matter if file is modified as adding and modifying
in parallel results undefined contents anyway (just that it should
not corrupt the repository).

-Ilari
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html