Re: [BUG] add_again() off-by-one error in custom format

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, Jun 18, 2017 at 12:58:37PM +0200, René Scharfe wrote:

> An oid_array spends ca. 30 bytes per entry (20 bytes for a hash times
> a factor of 1.5 from alloc_nr).  How many loose objects do we have to
> expect?  30 MB for a million of them sounds not too bad, but can there
> realistically be orders of magnitude more?

Good point. We gc by default at 6000 loose objects, and lots of thing
will suffer poor performance by the time you hit a million. So a little
extra space is probably not worth worrying about.

> So here's a patch for caching loose object hashes in an oid_array
> without a way to invalidate or release the cache.  It uses a single
> oid_array for simplicity, but reads each subdirectory individually and
> remembers that fact using a bitmap.

I like the direction of this very much.

> @@ -1586,6 +1587,9 @@ extern struct alternate_object_database {
>  	struct strbuf scratch;
>  	size_t base_len;
>  
> +	uint32_t loose_objects_subdir_bitmap[8];

Is it worth the complexity of having an actual bitmap and not just an
array of char? I guess it's not _that_ complex to access the bits, but
there are a lot of magic numbers involved.

> +	struct oid_array loose_objects_cache;

We should probably insert a comment here warning that the cache may go
out of date during the process run, and should only be used when that's
an acceptable tradeoff.

> +static void read_loose_object_subdir(struct alternate_object_database *alt,
> +				     int subdir_nr)

I think it's nice to pull this out into a helper function. I do wonder
if it should just be reusing for_each_loose_file_in_objdir(). You'd just
need to expose the in_obj_subdir() variant publicly.

It does do slightly more than we need (for the callbacks it actually
builds the filename), but I doubt memcpy()ing a few bytes would be
measurable.

> +		pos = oid_array_lookup(&alt->loose_objects_cache, &ds->bin_pfx);
> +		if (pos < 0)
> +			pos = -1 - pos;
> +		while (!ds->ambiguous && pos < alt->loose_objects_cache.nr) {
> +			const struct object_id *oid;
> +			oid = alt->loose_objects_cache.oid + pos;
> +			if (!match_sha(ds->len, ds->bin_pfx.hash, oid->hash))
> +				break;
> +			update_candidates(ds, oid);
> +			pos++;
>  		}

This part looks quite nice and straightforward.

-Peff



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]