Re: [PATCH] git-add: allow --ignore-missing always, not just in dry run

Dieter Plaetinck <dieter@xxxxxxxxxxxx> · Fri, 20 Jan 2012 19:14:51 +0100

On Thu, 19 Jan 2012 13:26:40 -0800
Junio C Hamano <gitster@xxxxxxxxx> wrote:

> Dieter Plaetinck <dieter@xxxxxxxxxxxx> writes:
> 
> > So basically, if this tool needs to check which files
> > still/no-longer exist before calling git-add, that's vulnerable to
> > race conditions.
> 
> I do not think you are solving the real problem in your script even
> if you allowed "add --ignore-missing".
> 
> I suspect you are making things even worse by using
> "--ignore-missing" in your script. If a user is actively updating the
> files in the filesystem, at least "git add" without
> "--ignore-missing" would catch the case where you _thought_ the user
> modified but still has the file, but in reality the further updates
> in the working tree removed the file, which is a clear indication
> that the rate you are processing the notify stream is slower than the
> activity generated by the user and allows you to notice that you may
> be better off waiting a bit until things calm down before running
> your automated commit.

I don't understand what you mean. if this happened:
1) modify file
2) modify file
3) remove file
but my script tries to git add --ignore-missing, when 3) already happened
but the script only sees event 2)
then it will not fail, then catch the 3rd event, then do a git rm.
if however, i don't enable ignore-missing, there's a failure on the git add.
I guess what you're saying is "better to get an error, interpret it as `this may not be a "real" error`, just continue"

> 
> Also, with or without "--ignore-missing", I think we have safety
> valves to cause "git add" fail if the file being added is updated
> while git is working on it (i.e. we read and compute the object name,
> and then store it compressed, and check the hash of what is stored
> matches the object name we computed earlier, which would fail if the
> file is updated in the middle at the right time).
> 
> This means that the "--ignore-missing" option will _not_ eliminate all
> cases where "git add" may detect an error and fails. In other words,
> your script needs to deal with error return from "git add" anyway
> even if we applied your patch and you used "--ignore-missing" in your
> script.
> 
> I have to say that the basic premise of your script is simply broken,
> and I am afraid that it is unfixable without an atomic snapshot
> support from the underlying filesystem (i.e. take a snapshot, run
> 'git add' on it, and then release the snapshot).

I assumed that `git add` works atomically. (as in: if you git add a file and
modify that file at the same time, either the old or the new version will be added
to the index, but always successfully).
If I understand this correctly, `git add` can only be successful if at least the file
remains untouched while the git command runs.
This means I should change my entire approach and be aware that git may return failures when the user is changing files while we are adding to the index. 
Maybe I should do something like:
once >0 inotify events happened,
* run git status, for each file in git status, if we've seen events, try to add file to the index.
if the above fails, wait a few seconds, then try again. if failures persist a few times in a row, only then we can be reasonbly sure something is really wrong.

but there will always be the case where a file gets deleted and (re)added, there is always the risk for races:
1) my script runs "git rm" after seeing a "delete" event but the file has been added in the meanwhile... git removes the file => bad
2) my script runs "git add" after seeing a new/changed file but the file has been removed in the meanwhile... git gives an error.
and in the latter case you can't just apply the "just retry" trick I mentioned above because the next event is delete, which would trigger a "git rm",
potentially causing unrecoverable data loss as described in 1).

one example of such things happening in real life is vim which creates (and removes) tmp files called `4913`.
We could just add such filenames to gitignore,
but that seems like a bit of a burden which shouldn't be needed.

To avoid the risks mentioned above and the "file can be altered during git add command",
what i'm really looking for is a way to, for a given file (file for which I've seen inotify events):
synchronize changes of that file to the index even if the file was unknown to git, or doesn't exist anymore;
and do that in an atomic way (or: i'll just retry a few times until it fails consistently, but removal race conditions can lead to data loss)

I tried `git update-index` as you suggested but couldn't have it behave like that.
And something like `git add --all` can still cause a file removal when it shouldn't, and lead to uncoverable data loss on race conditions.

Sorry for the long mail, I tried to keep it as minimal as possible.

thanks!
Dieter
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html