Re: Watchman support for git

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, May 4, 2014 at 3:49 AM, David Turner <dturner@xxxxxxxxxxxxxxxx> wrote:
> On Sat, 2014-05-03 at 15:49 +0700, Duy Nguyen wrote:
>> On Sat, May 3, 2014 at 11:39 AM, David Turner <dturner@xxxxxxxxxxxxxxxx> wrote:
>> >> Index v4 and split index (and the following read-cache daemon,
>> >> hopefully)
>> >
>> > Looking at some of the archives for read-cache daemon, it seems to be
>> > somewhat similar to watchman, right?  But I only saw inotify code; what
>> > about Mac OS?  Or am I misunderstanding what it is?
>>
>> It's mentioned in [1], the second paragraph, mostly to hide index I/O
>> read cost and the SHA-1 hashing cost in the background. In theory it
>> should work on all platforms that support multiple processes and
>> efficient IPC. It can help load watchman file cache faster too.
>
> Yes, that seems like a good idea.
>
> I actually wrote some of a more-complicated, weirder version of this
> idea.  In my version, there was a long-running git daemon process that
> held the index, the watchman file cache, and also objects loaded from
> the object database.  Other git commands would then send their
> command-line and arguments over to the daemon, which would run the
> commands and send stdin/out/err back.  Of course, this is complicated
> because git commands are designed to run then exit, so they often rely
> on variables being initialized to zero, or fail to free memory.  I used
> the Boehm GC to handle the memory freeing problem.  To handle variables
> that needed to be reinitialized, I used __attribute__(section...) to put
> them all into one section, which I could save on daemon startup and
> restore after each command.  I also replaced calls to exit() with a
> function that called longjmp() so the daemon could survive commands
> failing.  Had I continued, I would also have had to track open file
> descriptors to avoid leaking those.
>
> This was a giant mess that only sort-of worked: it was difficult to
> track down all of the variables that needed to be reinitialized.
>
> The advantage of my method is that there was somewhat less data to
> marshall over IPC, and that objects could be easily cached; the
> disadvantage is complexity, massive code changes, and the fact that it
> didn't actually totally work at the time I ran out of time.
>
> So I'm really looking forward to trying your version!

Hm.. I may face the same problem if I'm not careful. So far I think
the daemon only holds index data (with on-disk format, not in-memory),
mainly to cut out SHA-1 hashing cost. This is still at the idea phase
for me though, nothing is materialized yet.

> I would like to merge the feature into master.  It works well for me,
> and some of my colleagues who have tried it out.

Have you tried to turn watchman on by default, then run it with git
test suite? That usually helps.

> I can split the vmac patch into two, but one of them will remain quite
> large because it contains the code for VMAC and AES, which total a bit
> over 100k.  Since the list will probably reject that, I'll post a link
> to a repository containing the patches.

With the read-cache deamon, I think hashing cost is less of an issue,
so new hashing algorithm becomes less important. If you store the file
cache in the deamon's memory only, there's no need to hash anything.
But I guess you already tried this.

> I'm not 100% sure how to split the watchman patch up.  I could add the
> fs_cache code and then separately add the watchman code that populates
> the cache.  Do you think there is a need to divide it up beyond this?

I'll need to have closer look at your patches to give any suggestions.
Although if you don't mind waiting a bit, I can try to put my
untracked cache patches in good shape (hopefully in 2 weeks), then you
can mostly avoid touching dir.c and reuse my work.

I backed away from watchman support because I was worried about its
overhead (of watchman itself, and git/watchman IPC because it's not
designed specifically for git), which led me to try optimizing git as
much as possible without watchman first, then see how/if watchman can
help on top of that. I still think it's a good approach (maybe because
it started to make me doubt if watchman could pull a big performance
win on top to justify the changes to support it)
-- 
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]