Re: [PATCH v7 2/2] Verify index file before we opportunistically update it

Junio C Hamano <gitster@xxxxxxxxx> · Sat, 12 Apr 2014 00:05:45 -0700

Junio C Hamano <gitster@xxxxxxxxx> writes:

> What is the race under discussion about?  It is about the index,
> which corresponds one-to-one to the working tree, so in order for
> the "race" to matter, you need to be racing against another process
> that is not cooperating with you (e.g. a continuous and uncontrolled
> "git add" updating the index while you are doing a real work),
> mucking with the index in the same working tree.  How could such a
> workflow any useful in the real life?
>
> In the spectrum between useful and insane, there is a point where we
> should just tell the insane: don't do it then.
>
> Having said that...
>
>>> So in that sense that can be done as a less urgent follow-up for this topic.
>>
>> Yeah if racing at refresh time is a real problem, sure we should solve it first.
>
> ... In order to solve the write-vs-write competition in a useful
> way, you must serialize the competing accesses.  E.g. "git add"
> would first take a write lock on the index before read_cache(), and
> then do its operation and finally release the lock by the usual
> write-to-lock-close-then-rename dance.

Extending on this a bit.

Here I didn't mean the traditional ".lock" we create via the
lockfile() API.  Rather, we would want to use fcntl/flock style lock
that lets others _wait_, not _fail_.  Because...

> The lazy "read and refresh in-core first, hoping that we did not
> compete with anybody, and then check just before writing out after
> taking the lock" is a very good solution for the opportunistic
> update codepath, because it is an option not to write the result out
> when there actually was an update by somebody else.  But such an
> opportunistic locking scheme does not work for write-vs-write
> competition.  Upon a real conflict, you need to fail the entire
> operation.

...having multiple conflicting writers on the single index file is
what you thought about worth protecting against.  When somebody else
is pounding on the index file you are trying to prepare your next
commmit in, with his writing that can unexpectedly overwrite what
you prepared, you would at least want the accesses serialized,
without getting half of your attempt to "git add" fail (and having
to redo).  For that, you would want your "git add" to wait while the
other party is mucking with the index under lock, instead of
failing, which is what you would get from the traditional lockfile
API based locking.

But that still takes us back to the "don't do it then".  It is true
that, with serialization, you may be able to guarantee that one "git
add" you do would start from one state (which may record the state
of your previous "git add", or which may have further been modified
by the other process) and ends with whatever that "git add" added,
without any other change.  But even in that case, when you finally
do a "git commit", can you say you know what is in the index?  I do
not think so.  After all, the root cause of the "race" issue is that
the other process pounds at the same index while you are working on
it, without any coordination with you.

So...
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html