Re: [PATCH v14 00/21] index-helper/watchman

Duy Nguyen <pclouds@xxxxxxxxx> · Thu, 14 Jul 2016 17:56:30 +0200

On Wed, Jul 13, 2016 at 11:59 PM, David Turner <novalis@xxxxxxxxxxx> wrote:
> On 07/12/2016 02:24 PM, Duy Nguyen wrote:
>>
>> Just thinking out loud. I've been thinking about this more about this.
>> After the move from signal-based to unix socket for communication, we
>> probably are better off with a simpler design than the shm-alike one
>> we have now.
>>
>> What if we send everything over a socket or a pipe? Sending 500MB over
>> a unix socket takes 253ms, that's insignificant when operations on an
>> index that size usually take seconds. If we send everything over
>> socket/pipe, we can trust data integrity and don't have to verify,
>> even the trailing SHA-1 in shm file.
>
>
> I think it would be good to make index operations not take seconds.
>
> In general, we should not need to verify the trailing SHA-1 for shm data.
> So the index-helper verifies it when it loads it, but the git (e.g.) status
> should not need to verify.
>
> Also, if we have two git commands running at the same time, the index-helper
> can only serve one at a time; with shm, both can run at full speed.

We still have an option to send a (shm, possibly) path to git to pick
up and skip verification. If we can exchange capabilities then sending
the index some way else is always possible.

>> So, what I have in mind is this, at read index time, instead of open a
>> socket, we run a separate program and communicate via pipes. We can
>> exchange capabilities if needed, then the program sends the entire
>> current index, the list of updated files back (and/or the list of dirs
>> to invalidate). The design looks very much like a smudge/clean filter.
>
>
> This seems very complicated.  Now git status talks to the separate program,
> which talks to the index-helper, which talks to watchman.  That is a lot of
> steps!

I was suggesting this because I think it would simplify things, not
complicate stuff further. Yes the separate program plays the role of
our unix client, if we keep the index-helper. But we don't have to.

Do you remember Junio once suggested to put the index on tmpfs? That's
what I imagine in common, medium scale setups. We don't need an extra
daemon:

1) when git needs the index, the script looks at its tmpfs mount, if
found, pass the path back
2) when git announces the index has been updated, the script reads the
index and saves it in tmpfs
3) when git refreshes and asks for watchman support, the script simply
runs "watchman" command, post processes the output a bit and send the
file list to git

Because there is no separate daemon in this case, we don't need
--kill, we don't need --autorun. We still need WAMA extension but it
can contain just an arbitrary clock string, this is completely opaque
to git. If we can get rid of the index-helper (with an example script
probably landed in contrib folder), that's a lot of less headache down
the road.

For giant-scale repos, you probably want something more efficient than
a script like this. And the good thing is you have freedom to do
whatever you want. You can run one daemon per repo, you can run one
daemon per system... In some previous mail exchange with Dscho, it was
mentioned that something other than watchman may be desired. This
opens up that door without much headache from outside.

> I think the daemon also has the advantage that it can reload the index as
> soon as it changes.  This is not quite implemented, but it would be pretty
> easy to do.  That would save a lot of time in the typical workflow.

A script has the same advantage, that is if git notifies it (like we
do now). You can also do it using watchman trigger, which does not
need any special support from git.
-- 
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html