Re: [PATCH] read-cache: call verify_hdr() in a background thread

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



git@xxxxxxxxxxxxxxxxx wrote:

> From: Jeff Hostetler <jeffhost@xxxxxxxxxxxxx>
>
> Teash do_read_index() in read-cache.c to call verify_hdr()

s/Teash/Teach/

> in a background thread while the forground thread parses

s/forground/foreground/

> the index and builds the_index.
>
> This is a performance optimization to reduce the overall
> time required to get the index into memory.
>
> Testing on Windows (using the OpenSSL SHA1 routine) showed
> that parsing the index and computing the SHA1 take almost
> equal time, so this patch effectively reduces the startup
> time by 1/2.
>
> Signed-off-by: Jeff Hostetler <jeffhost@xxxxxxxxxxxxx>

Nice.  Do you have example commands I can run to reproduce
that benchmark?  (Even better if you can phrase that as a
patch against t/perf/.)

[...]
> --- a/read-cache.c
> +++ b/read-cache.c
> @@ -1564,6 +1564,83 @@ static void post_read_index_from(struct index_state *istate)
>  	tweak_untracked_cache(istate);
>  }
>  
> +#ifdef NO_PTHREADS
> +
> +struct verify_hdr_thread_data {
> +	struct cache_header *hdr;
> +	size_t size;

'size' appears to always be cast to an unsigned long when it's
used.  Why not use unsigned long consistently?

> +};
> +
> +/*
> + * Non-threaded version does all the work immediately.
> + * Returns < 0 on error or bad signature.
> + */
> +static int verify_hdr_start(struct verify_hdr_thread_data *d)
> +{
> +	return verify_hdr(d->hdr, (unsigned long)d->size);
> +}
> +
> +static int verify_hdr_finish(struct verify_hdr_thread_data *d)
> +{
> +	return 0;
> +}
> +
> +#else
> +
> +#include <pthread.h>

Please put this at the top of the file with other #includes.  One
simple way to do that is to #include "thread-utils.h" at the top of
the file unconditionally.

> +
> +/*
> + * Require index file to be larger than this threshold before
> + * we bother using a background thread to verify the SHA.
> + */
> +#define VERIFY_HDR_THRESHOLD    (1024)

nits: (1) no need for parens for a numerical macro like this
(2) comment can be made briefer and more explicit:

/*
 * Index size threshold in bytes before it's worth bothering to
 * use a background thread to verify the index file.
 */

How was this value chosen?

> +
> +struct verify_hdr_thread_data {
> +	pthread_t thread_id;
> +	struct cache_header *hdr;
> +	size_t size;
> +	int result;
> +};

All structs are data.  Other parts of git seem to name this kind of
callback cookie *_cb_data, so perhaps verify_hdr_cb_data?

On the other hand this seems to also be used by the caller as a handle
to the async verify_hdr process. Maybe verify_hdr_state?

This seems to be doing something similar to the existing 'struct
async' interface.  Could it use that instead, or does it incur too
much overhead?  An advantage would be avoiding having to handle the
NO_PTHREADS ifdef-ery.

> +
> +/*
> + * Thread proc to run verify_hdr() computation in a background thread.
> + */
> +static void *verify_hdr_thread_proc(void *_data)

Please don't name identifiers with a leading underscore --- those are
reserved names.

> +{
> +	struct verify_hdr_thread_data *d = _data;
> +	d->result = verify_hdr(d->hdr, (unsigned long)d->size);
> +	return NULL;
> +}
> +
> +/*
> + * Threaded version starts background thread and returns zero
> + * to indicate that we don't know the hash is bad yet.  If the
> + * index is too small, we just do the work imediately.
> + */
> +static int verify_hdr_start(struct verify_hdr_thread_data *d)

This comment restates what the code says.  Is there background or
something about the intent behind the code that could be said instead
to help the reader?  Otherwise I'd suggest removing the comment.

[...]
> @@ -1574,6 +1651,7 @@ int do_read_index(struct index_state *istate, const char *path, int must_exist)
>  	void *mmap;
>  	size_t mmap_size;
>  	struct strbuf previous_name_buf = STRBUF_INIT, *previous_name;
> +	struct verify_hdr_thread_data data;
>  
>  	if (istate->initialized)
>  		return istate->cache_nr;
> @@ -1600,7 +1678,10 @@ int do_read_index(struct index_state *istate, const char *path, int must_exist)
>  	close(fd);
>  
>  	hdr = mmap;
> -	if (verify_hdr(hdr, mmap_size) < 0)
> +
> +	data.hdr = hdr;
> +	data.size = mmap_size;
> +	if (verify_hdr_start(&data) < 0)
>  		goto unmap;

What happens if there is an error before the code reaches the end of
the function?  I think there needs to be a verify_hdr_finish call in
the 'unmap:' cleanup section.

The rest looks reasonable.

Thanks and hope that helps,
Jonathan



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]