git@xxxxxxxxxxxxxxxxx wrote: > From: Jeff Hostetler <jeffhost@xxxxxxxxxxxxx> > > Teash do_read_index() in read-cache.c to call verify_hdr() s/Teash/Teach/ > in a background thread while the forground thread parses s/forground/foreground/ > the index and builds the_index. > > This is a performance optimization to reduce the overall > time required to get the index into memory. > > Testing on Windows (using the OpenSSL SHA1 routine) showed > that parsing the index and computing the SHA1 take almost > equal time, so this patch effectively reduces the startup > time by 1/2. > > Signed-off-by: Jeff Hostetler <jeffhost@xxxxxxxxxxxxx> Nice. Do you have example commands I can run to reproduce that benchmark? (Even better if you can phrase that as a patch against t/perf/.) [...] > --- a/read-cache.c > +++ b/read-cache.c > @@ -1564,6 +1564,83 @@ static void post_read_index_from(struct index_state *istate) > tweak_untracked_cache(istate); > } > > +#ifdef NO_PTHREADS > + > +struct verify_hdr_thread_data { > + struct cache_header *hdr; > + size_t size; 'size' appears to always be cast to an unsigned long when it's used. Why not use unsigned long consistently? > +}; > + > +/* > + * Non-threaded version does all the work immediately. > + * Returns < 0 on error or bad signature. > + */ > +static int verify_hdr_start(struct verify_hdr_thread_data *d) > +{ > + return verify_hdr(d->hdr, (unsigned long)d->size); > +} > + > +static int verify_hdr_finish(struct verify_hdr_thread_data *d) > +{ > + return 0; > +} > + > +#else > + > +#include <pthread.h> Please put this at the top of the file with other #includes. One simple way to do that is to #include "thread-utils.h" at the top of the file unconditionally. > + > +/* > + * Require index file to be larger than this threshold before > + * we bother using a background thread to verify the SHA. > + */ > +#define VERIFY_HDR_THRESHOLD (1024) nits: (1) no need for parens for a numerical macro like this (2) comment can be made briefer and more explicit: /* * Index size threshold in bytes before it's worth bothering to * use a background thread to verify the index file. */ How was this value chosen? > + > +struct verify_hdr_thread_data { > + pthread_t thread_id; > + struct cache_header *hdr; > + size_t size; > + int result; > +}; All structs are data. Other parts of git seem to name this kind of callback cookie *_cb_data, so perhaps verify_hdr_cb_data? On the other hand this seems to also be used by the caller as a handle to the async verify_hdr process. Maybe verify_hdr_state? This seems to be doing something similar to the existing 'struct async' interface. Could it use that instead, or does it incur too much overhead? An advantage would be avoiding having to handle the NO_PTHREADS ifdef-ery. > + > +/* > + * Thread proc to run verify_hdr() computation in a background thread. > + */ > +static void *verify_hdr_thread_proc(void *_data) Please don't name identifiers with a leading underscore --- those are reserved names. > +{ > + struct verify_hdr_thread_data *d = _data; > + d->result = verify_hdr(d->hdr, (unsigned long)d->size); > + return NULL; > +} > + > +/* > + * Threaded version starts background thread and returns zero > + * to indicate that we don't know the hash is bad yet. If the > + * index is too small, we just do the work imediately. > + */ > +static int verify_hdr_start(struct verify_hdr_thread_data *d) This comment restates what the code says. Is there background or something about the intent behind the code that could be said instead to help the reader? Otherwise I'd suggest removing the comment. [...] > @@ -1574,6 +1651,7 @@ int do_read_index(struct index_state *istate, const char *path, int must_exist) > void *mmap; > size_t mmap_size; > struct strbuf previous_name_buf = STRBUF_INIT, *previous_name; > + struct verify_hdr_thread_data data; > > if (istate->initialized) > return istate->cache_nr; > @@ -1600,7 +1678,10 @@ int do_read_index(struct index_state *istate, const char *path, int must_exist) > close(fd); > > hdr = mmap; > - if (verify_hdr(hdr, mmap_size) < 0) > + > + data.hdr = hdr; > + data.size = mmap_size; > + if (verify_hdr_start(&data) < 0) > goto unmap; What happens if there is an error before the code reaches the end of the function? I think there needs to be a verify_hdr_finish call in the 'unmap:' cleanup section. The rest looks reasonable. Thanks and hope that helps, Jonathan