On Sat, Mar 19, 2016 at 8:04 AM, David Turner <dturner@xxxxxxxxxxxxxxxx> wrote: > From: Nguyễn Thái Ngọc Duy <pclouds@xxxxxxxxx> > > Instead of reading the index from disk and worrying about disk > corruption, the index is cached in memory (memory bit-flips happen > too, but hopefully less often). The result is faster read. Read time > is reduced by 70%. > > The biggest gain is not having to verify the trailing SHA-1, which > takes lots of time especially on large index files. But this also > opens doors for further optimiztions: > > - we could create an in-memory format that's essentially the memory > dump of the index to eliminate most of parsing/allocation > overhead. The mmap'd memory can be used straight away. Experiment > [1] shows we could reduce read time by 88%. This reference [1] is missing (even in my old version). I believe it's http://thread.gmane.org/gmane.comp.version-control.git/247268/focus=248771, comparing 256.442ms in that mail with v4 number, 2245.113ms in 0/8 mail from the same thread. > Git can poke the daemon via named pipes to tell it to refresh the > index cache, or to keep it alive some more minutes. It can't give any > real index data directly to the daemon. Real data goes to disk first, > then the daemon reads and verifies it from there. Poking only happens > for $GIT_DIR/index, not temporary index files. I think we should go with unix socket on *nix platform instead of named pipe. UNIX named pipe only allows one communication channel at a time. Windows named pipe is different and allows multiple clients, which is the same as unix socket. > $GIT_DIR/index-helper.pipe is the named pipe for daemon process. The > daemon reads from the pipe and executes commands. Commands that need > replies from the daemon will have to open their own pipe, since a > named pipe should only have one reader. Unix domain sockets don't > have this problem, but are less portable. Hm..NO_UNIX_SOCKETS is only set for Windows in config.mak.uname and Windows will need to be specially tailored anyway, I think unix socket would be more elegant. > +static void share_index(struct index_state *istate, struct shm *is) > +{ > + void *new_mmap; > + if (istate->mmap_size <= 20 || > + hashcmp(istate->sha1, > + (unsigned char *)istate->mmap + istate->mmap_size - 20) || > + !hashcmp(istate->sha1, is->sha1) || > + git_shm_map(O_CREAT | O_EXCL | O_RDWR, 0700, istate->mmap_size, > + &new_mmap, PROT_READ | PROT_WRITE, MAP_SHARED, > + "git-index-%s", sha1_to_hex(istate->sha1)) < 0) > + return; > + > + release_index_shm(is); > + is->size = istate->mmap_size; > + is->shm = new_mmap; > + hashcpy(is->sha1, istate->sha1); > + memcpy(new_mmap, istate->mmap, istate->mmap_size - 20); > + > + /* > + * The trailing hash must be written last after everything is > + * written. It's the indication that the shared memory is now > + * ready. > + */ > + hashcpy((unsigned char *)new_mmap + istate->mmap_size - 20, is->sha1); You commented here [1] a long time ago about memory barrier. I'm not entirely sure if compilers dare to reorder function calls, but when hashcpy is inlined and memcpy is builtin, I suppose that's possible... [1] http://article.gmane.org/gmane.comp.version-control.git/280729 > +} -- Duy -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html