Re: [RFC PATCH 10/10] pack-objects: improve partial packfile reuse

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Christian Couder <christian.couder@xxxxxxxxx> writes:

> +/*
> + * Record the offsets needed in our reused packfile chunks due to
> + * "gaps" where we omitted some objects.
> + */

The meaning of 'start' and 'offset' is unclear from the first
reading.  Is it "starting offset" and "for how many bytes the region
lasts"?  If so, 'offset', which is usually a location (unless you
always measure from the beginning, in which case you could say it
names the byte-size of a region), may be a misnomer (side note: I'll
pretend that I haven't realized the 'offset' may be the end offset
of a region for now---this is a good illustration why a better
comment should be here anyway ;-).

> +static struct reused_chunk {
> +	off_t start;
> +	off_t offset;
> +} *reused_chunks;
> +static int reused_chunks_nr;
> +static int reused_chunks_alloc;
> +
> +static void record_reused_object(off_t where, off_t offset)

And here, 'start' is called 'where'; either is a good word for a
location; we'd want to pick either one to be consistent, perhaps?

>  {
> -	unsigned char buffer[8192];
> -	off_t to_write, total;
> -	int fd;
> +	if (reused_chunks_nr && reused_chunks[reused_chunks_nr-1].offset == offset)
> +		return;

The reason why the above code works is because this function will
always be called in the ascending order of the 'offset'?

Hmmm, perhaps 'offset' is not a region-size after all.  Is it the
end offset, as opposed to 'start' which is the starting offset, and
the two offsets sandwitch a region?

> -	if (!is_pack_valid(reuse_packfile))
> -		die(_("packfile is invalid: %s"), reuse_packfile->pack_name);
> +	ALLOC_GROW(reused_chunks, reused_chunks_nr + 1,
> +		   reused_chunks_alloc);
> +	reused_chunks[reused_chunks_nr].start = where;
> +	reused_chunks[reused_chunks_nr].offset = offset;
> +	reused_chunks_nr++;
> +}
>  
> -	fd = git_open(reuse_packfile->pack_name);
> -	if (fd < 0)
> -		die_errno(_("unable to open packfile for reuse: %s"),
> -			  reuse_packfile->pack_name);
> +/*
> + * Binary search to find the chunk that "where" is in. Note
> + * that we're not looking for an exact match, just the first
> + * chunk that contains it (which implicitly ends at the start
> + * of the next chunk.
> + */
> +static off_t find_reused_offset(off_t where)
> +{
> +	int lo = 0, hi = reused_chunks_nr;
> +	while (lo < hi) {
> +		int mi = lo + ((hi - lo) / 2);
> +		if (where == reused_chunks[mi].start)
> +			return reused_chunks[mi].offset;
> +		if (where < reused_chunks[mi].start)
> +			hi = mi;
> +		else
> +			lo = mi + 1;
> +	}
>  
> -	if (lseek(fd, sizeof(struct pack_header), SEEK_SET) == -1)
> -		die_errno(_("unable to seek in reused packfile"));
> +	/*
> +	 * The first chunk starts at zero, so we can't have gone below
> +	 * there.
> +	 */
> +	assert(lo);
> +	return reused_chunks[lo-1].offset;
> +}

This comment has nothing to do with the change, but the way the
patch is presented is quite hard to follow, in that the preimage or
the common context lines do not help understand what the new code is
doing at all ;-)

I'll come back to the remainder of the patch later.  Thanks.



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux