On Thu, 19 Jan 2012 13:26:40 -0800 Junio C Hamano <gitster@xxxxxxxxx> wrote: > Dieter Plaetinck <dieter@xxxxxxxxxxxx> writes: > > > So basically, if this tool needs to check which files > > still/no-longer exist before calling git-add, that's vulnerable to > > race conditions. > > I do not think you are solving the real problem in your script even > if you allowed "add --ignore-missing". > > I suspect you are making things even worse by using > "--ignore-missing" in your script. If a user is actively updating the > files in the filesystem, at least "git add" without > "--ignore-missing" would catch the case where you _thought_ the user > modified but still has the file, but in reality the further updates > in the working tree removed the file, which is a clear indication > that the rate you are processing the notify stream is slower than the > activity generated by the user and allows you to notice that you may > be better off waiting a bit until things calm down before running > your automated commit. I don't understand what you mean. if this happened: 1) modify file 2) modify file 3) remove file but my script tries to git add --ignore-missing, when 3) already happened but the script only sees event 2) then it will not fail, then catch the 3rd event, then do a git rm. if however, i don't enable ignore-missing, there's a failure on the git add. I guess what you're saying is "better to get an error, interpret it as `this may not be a "real" error`, just continue" > > Also, with or without "--ignore-missing", I think we have safety > valves to cause "git add" fail if the file being added is updated > while git is working on it (i.e. we read and compute the object name, > and then store it compressed, and check the hash of what is stored > matches the object name we computed earlier, which would fail if the > file is updated in the middle at the right time). > > This means that the "--ignore-missing" option will _not_ eliminate all > cases where "git add" may detect an error and fails. In other words, > your script needs to deal with error return from "git add" anyway > even if we applied your patch and you used "--ignore-missing" in your > script. > > I have to say that the basic premise of your script is simply broken, > and I am afraid that it is unfixable without an atomic snapshot > support from the underlying filesystem (i.e. take a snapshot, run > 'git add' on it, and then release the snapshot). I assumed that `git add` works atomically. (as in: if you git add a file and modify that file at the same time, either the old or the new version will be added to the index, but always successfully). If I understand this correctly, `git add` can only be successful if at least the file remains untouched while the git command runs. This means I should change my entire approach and be aware that git may return failures when the user is changing files while we are adding to the index. Maybe I should do something like: once >0 inotify events happened, * run git status, for each file in git status, if we've seen events, try to add file to the index. if the above fails, wait a few seconds, then try again. if failures persist a few times in a row, only then we can be reasonbly sure something is really wrong. but there will always be the case where a file gets deleted and (re)added, there is always the risk for races: 1) my script runs "git rm" after seeing a "delete" event but the file has been added in the meanwhile... git removes the file => bad 2) my script runs "git add" after seeing a new/changed file but the file has been removed in the meanwhile... git gives an error. and in the latter case you can't just apply the "just retry" trick I mentioned above because the next event is delete, which would trigger a "git rm", potentially causing unrecoverable data loss as described in 1). one example of such things happening in real life is vim which creates (and removes) tmp files called `4913`. We could just add such filenames to gitignore, but that seems like a bit of a burden which shouldn't be needed. To avoid the risks mentioned above and the "file can be altered during git add command", what i'm really looking for is a way to, for a given file (file for which I've seen inotify events): synchronize changes of that file to the index even if the file was unknown to git, or doesn't exist anymore; and do that in an atomic way (or: i'll just retry a few times until it fails consistently, but removal race conditions can lead to data loss) I tried `git update-index` as you suggested but couldn't have it behave like that. And something like `git add --all` can still cause a file removal when it shouldn't, and lead to uncoverable data loss on race conditions. Sorry for the long mail, I tried to keep it as minimal as possible. thanks! Dieter -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html