Re: [PATCH] git: add --no-optional-locks option

Daniel Santos <daniel.santos@xxxxxxxxx> · Fri, 22 Sep 2017 01:42:10 -0500

On 09/20/2017 11:32 PM, Jeff King wrote:
> Johannes, this is an adaptation of your 67e5ce7f63 (status: offer *not*
> to lock the index and update it, 2016-08-12). Folks working on GitHub
> Desktop complained to me that it's only available on Windows. :)
>
> I expanded the scope a bit to let us give the same treatment to more
> commands in the long run.  I'd also be OK with just cherry-picking your
> patch to non-Windows Git if you don't find my reasoning below
> compelling. But I think we need _something_ like this, as the other
> solutions I could come up with don't seem very promising.
>
> -Peff
>
> -- >8 --
> Some tools like IDEs or fancy editors may periodically run
> commands like "git status" in the background to keep track
> of the state of the repository. Some of these commands may
> refresh the index and write out the result in an
> opportunistic way: if they can get the index lock, then they
> update the on-disk index with any updates they find. And if
> not, then their in-core refresh is lost and just has to be
> recomputed by the next caller.
>
> But taking the index lock may conflict with other operations
> in the repository. Especially ones that the user is doing
> themselves, which _aren't_ opportunistic. In other words,
> "git status" knows how to back off when somebody else is
> holding the lock, but other commands don't know that status
> would be happy to drop the lock if somebody else wanted it.

Interestingly, this usually slaps me when performing an _interactive_
rebase.  It occurred to me that if I'm performing an interaction
operation, it doesn't seem unreasonable for git wait up to 125ms or so
for the lock and then prompting the user to ask if they want to continue
waiting for the lock.

> There are a couple possible solutions:
>
>   1. Have some kind of "pseudo-lock" that allows other
>      commands to tell status that they want the lock.
>
>      This is likely to be complicated and error-prone to
>      implement (and maybe even impossible with just
>      dotlocks to work from, as it requires some
>      inter-process communication).
>
>   2. Avoid background runs of commands like "git status"
>      that want to do opportunistic updates, preferring
>      instead plumbing like diff-files, etc.
>
>      This is awkward for a couple of reasons. One is that
>      "status --porcelain" reports a lot more about the
>      repository state than is available from individual
>      plumbing commands. And two is that we actually _do_
>      want to see the refreshed index. We just don't want to
>      take a lock or write out the result. Whereas commands
>      like diff-files expect us to refresh the index
>      separately and write it to disk so that they can depend
>      on the result. But that write is exactly what we're
>      trying to avoid.
>
>   3. Ask "status" not to lock or write the index.
>
>      This is easy to implement. The big downside is that any
>      work done in refreshing the index for such a call is
>      lost when the process exits. So a background process
>      may end up re-hashing a changed file multiple times
>      until the user runs a command that does an index
>      refresh themselves.

That is not necessarily the case.  I don't actually know git on the
inside, but I would ask you to consider a read-write lock and a hybrid
of one and three.

I don't know what dotlocks are, but I'm certain that you can implement a
rw lock using lock files and no other IPC, although it does increase the
complexity.  The way this works is that `git status' acquires a read
lock and does its thing.  If it has real changes, instead of discarding
them it attempts to upgrade to a write lock.  If that fails, you throw
it away, otherwise you write them and release.

In order to implement rw locks with only lock files, "off the cuff" I
say you have a single "lock list" file that should never be deleted and
a "lock lock" file that is held in order to read or modify the list. 
The format of the lock list would have a pair of 32-bit wrapping
modification counts (or versions) at the top -- one for modifications to
the lock list its self and another for modifications to the underlying
data (i.e., the number of times a write lock has been acquired).  This
header is followed by entries something like this:

<operation> <pid> <version> <timestamp>

<operation>  'r' if waiting for a read lock
             'R' if actively reading
             'w' if waiting for write lock
             'W' if actively writing
<pid>        The pid
<version>    If active, the version of the data at the time lock acquired or zero.
<timestamp>  Time began waiting or time lock acquired

An operation of 'r' or 'w' means that you are waiting and upper case
means that you are active.  <version> is the version of the data at the
time the lock became active and writers increment it when they acquire a
lock.  You wait with file alteration notification on the lock list (if
there is any doubt based upon timestamp precision then you can examine
the lock list version).  When you want to read or write, you lock the
lock-lock (this sounds like a joke... "Lock, lock.", "Who's there?",
"Reader.", "Reader who?"....) and examine the lock list.  If it's empty,
you add yourself as an active reader or writer with an upper case 'R' or
'W' and release the lock-lock.  If there are only readers, and you want
to read, you add yourself as an active reader.  The version of the lock
list is incremented every time it is modified.

Read-write locks need to be given a priority policy of either readers,
writers, fifo or don't-care.  In this case that should probably be
writers.  So if you want to read and there is an active or waiting
writer, the prospective reader would either add themselves with an 'r'
or fail if they don't want to wait.  If a process wants to write and
there are active readers or writers, it adds its self to the list or
fails as well.  When all active readers have exited, then which ever
prospective writers gets the lock-lock first can make themselves the
active writer.  When a process acquires a write lock, it increases the
data version number.  If a reader lock tries to upgrade to a writer lock
but the data version changed than it fails.

Is there not already a library somewhere that does this?  Either way,
your current effort seems like a step in the right direction -- and
thanks for that!

Daniel