Re: [PATCH v2 03/17] index-helper: new daemon for caching index and related stuff

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, Mar 19, 2016 at 8:04 AM, David Turner <dturner@xxxxxxxxxxxxxxxx> wrote:
> From: Nguyễn Thái Ngọc Duy <pclouds@xxxxxxxxx>
>
> Instead of reading the index from disk and worrying about disk
> corruption, the index is cached in memory (memory bit-flips happen
> too, but hopefully less often). The result is faster read. Read time
> is reduced by 70%.
>
> The biggest gain is not having to verify the trailing SHA-1, which
> takes lots of time especially on large index files. But this also
> opens doors for further optimiztions:
>
>  - we could create an in-memory format that's essentially the memory
>    dump of the index to eliminate most of parsing/allocation
>    overhead. The mmap'd memory can be used straight away. Experiment
>    [1] shows we could reduce read time by 88%.

This reference [1] is missing (even in my old version). I believe it's
http://thread.gmane.org/gmane.comp.version-control.git/247268/focus=248771,
comparing 256.442ms in that mail with v4 number, 2245.113ms in 0/8
mail from the same thread.

> Git can poke the daemon via named pipes to tell it to refresh the
> index cache, or to keep it alive some more minutes. It can't give any
> real index data directly to the daemon. Real data goes to disk first,
> then the daemon reads and verifies it from there. Poking only happens
> for $GIT_DIR/index, not temporary index files.

I think we should go with unix socket on *nix platform instead of
named pipe. UNIX named pipe only allows one communication channel at a
time. Windows named pipe is different and allows multiple clients,
which is the same as unix socket.


> $GIT_DIR/index-helper.pipe is the named pipe for daemon process. The
> daemon reads from the pipe and executes commands.  Commands that need
> replies from the daemon will have to open their own pipe, since a
> named pipe should only have one reader.  Unix domain sockets don't
> have this problem, but are less portable.

Hm..NO_UNIX_SOCKETS is only set for Windows in config.mak.uname and
Windows will need to be specially tailored anyway, I think unix socket
would be more elegant.

> +static void share_index(struct index_state *istate, struct shm *is)
> +{
> +       void *new_mmap;
> +       if (istate->mmap_size <= 20 ||
> +           hashcmp(istate->sha1,
> +                   (unsigned char *)istate->mmap + istate->mmap_size - 20) ||
> +           !hashcmp(istate->sha1, is->sha1) ||
> +           git_shm_map(O_CREAT | O_EXCL | O_RDWR, 0700, istate->mmap_size,
> +                       &new_mmap, PROT_READ | PROT_WRITE, MAP_SHARED,
> +                       "git-index-%s", sha1_to_hex(istate->sha1)) < 0)
> +               return;
> +
> +       release_index_shm(is);
> +       is->size = istate->mmap_size;
> +       is->shm = new_mmap;
> +       hashcpy(is->sha1, istate->sha1);
> +       memcpy(new_mmap, istate->mmap, istate->mmap_size - 20);
> +
> +       /*
> +        * The trailing hash must be written last after everything is
> +        * written. It's the indication that the shared memory is now
> +        * ready.
> +        */
> +       hashcpy((unsigned char *)new_mmap + istate->mmap_size - 20, is->sha1);

You commented here [1] a long time ago about memory barrier. I'm not
entirely sure if compilers dare to reorder function calls, but when
hashcpy is inlined and memcpy is builtin, I suppose that's possible...

[1] http://article.gmane.org/gmane.comp.version-control.git/280729

> +}
-- 
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]