On Wed, 2016-03-09 at 15:09 -0800, Junio C Hamano wrote: > David Turner <dturner@xxxxxxxxxxxxxxxx> writes: > > > From: Nguyễn Thái Ngọc Duy <pclouds@xxxxxxxxx> > > > > Instead of reading the index from disk and worrying about disk > > corruption, the index is cached in memory (memory bit-flips happen > > too, but hopefully less often). The result is faster read. Read > > time > > is reduced by 70%. > > > > The biggest gain is not having to verify the trailing SHA-1, which > > takes lots of time especially on large index files. But this also > > opens doors for further optimiztions: > > > > - we could create an in-memory format that's essentially the > > memory > > dump of the index to eliminate most of parsing/allocation > > overhead. The mmap'd memory can be used straight away. > > Experiment > > [1] shows we could reduce read time by 88%. > > > > - we could cache non-index info such as name hash > > > > The shared memory's name folows the template "git-<object>-<SHA1>" > > where <SHA1> is the trailing SHA-1 of the index file. <object> is > > "index" for cached index files (and may be "name-hash" for name > > -hash > > cache). If such shared memory exists, it contains the same index > > content as on disk. The content is already validated by the daemon > > and > > git won't validate it again (except comparing the trailing SHA-1s). > > This indeed is an interesting approach; what is not explained but > must be is when the on-disk index is updated to reflect the reality > (if I am reading the explanation and the code right, while the > daemon is running, its in-core cache becomes the source of truth by > forcing everybody's read-index-from() to go to the daemon). The > explanation could be "this is only for read side, and updating the > index happens via the traditional 'write a new file and rename it to > the final place' codepath, at which time the daemon must be told to > re-read it." This seems like the explanation (from the current commit message): "Git can poke the daemon to tell it to refresh the index cache, or to keep it alive some more minutes via UNIX signals. It can't give any real index data directly to the daemon. Real data goes to disk first, then the daemon reads and verifies it from there. Poking only happens for $GIT_DIR/index, not temporary index files." I guess this could be rewritten as: "Index validity is ensured by the following method: When a read is requested from the index-helper, it checks the SHA1 of its cached index copy against the on-disk version. If they differ, index-helper rereads the index. In addition, any git process may explicitly suggest a reread via a UNIX signal, but this is only an optimization and it is not necessary for correctness. In addition, Git can signal the daemon with a heartbeat signal, to keep the daemon alive longer." How does that sound? -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html