Re: [PATCH v7 11/13] pack-objects: shrink size field in struct object_entry

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, Mar 24, 2018 at 07:33:51AM +0100, Nguyễn Thái Ngọc Duy wrote:

> It's very very rare that an uncompressed object is larger than 4GB
> (partly because Git does not handle those large files very well to
> begin with). Let's optimize it for the common case where object size
> is smaller than this limit.
> 
> Shrink size field down to 32 bits [1] and one overflow bit. If the
> size is too large, we read it back from disk. As noted in the previous
> patch, we need to return the delta size instead of canonical size when
> the to-be-reused object entry type is a delta instead of a canonical
> one.
> 
> Add two compare helpers that can take advantage of the overflow
> bit (e.g. if the file is 4GB+, chances are it's already larger than
> core.bigFileThreshold and there's no point in comparing the actual
> value).
> 
> Another note about oe_get_size_slow(). This function MUST be thread
> safe because SIZE() macro is used inside try_delta() which may run in
> parallel. Outside parallel code, no-contention locking should be dirt
> cheap (or insignificant compared to i/o access anyway). To exercise
> this code, it's best to run the test suite with something like
> 
>     make test GIT_TEST_OE_SIZE_BITS=2
> 
> which forces this code on all objects larger than 3 bytes.

OK, makes sense. Since we need it to be thread-safe, we have to use
read_lock(). Which means that oe_get_size_slow() is defined in
builtin/pack-objects.c. But the object_entry is defined in the
more-generic pack-objects.h.

So anybody besides builtin/pack-objects.c will have to implement their
own fallback when e->size_valid isn't true. Which is a little odd, but I
guess nobody else needs that field. It might bite us in the future, but
I'm willing to cross my fingers for now (the pack-objects.h header is
really just there to support the bitmap writing code, but even that
could in theory all get shoved into a single translation unit if we had
to).

> [1] it's actually already 32 bits on Windows

And linux-i386. :)

> +unsigned long oe_get_size_slow(struct packing_data *pack,
> +			       const struct object_entry *e)
> +{

I think I already replied about this earlier, so I'll skim over it this
time.

> diff --git a/pack-objects.c b/pack-objects.c
> index 13f2b2bff2..59c6e40a02 100644
> --- a/pack-objects.c
> +++ b/pack-objects.c
> @@ -120,8 +120,15 @@ struct object_entry *packlist_alloc(struct packing_data *pdata,
>  {
>  	struct object_entry *new_entry;
>  
> -	if (!pdata->nr_objects)
> +	if (!pdata->nr_objects) {
>  		prepare_in_pack_by_idx(pdata);
> +		if (getenv("GIT_TEST_OE_SIZE_BITS")) {
> +			int bits = atoi(getenv("GIT_TEST_OE_SIZE_BITS"));;
> +			pdata->oe_size_limit = 1 << bits;
> +		}
> +		if (!pdata->oe_size_limit)
> +			pdata->oe_size_limit = 1 << OE_SIZE_BITS;
> +	}

This needs to be "1U << OE_SIZE_BITS". Shifting a signed integer 31 bits
is undefined.

No, I'm not that clever or careful myself. I ran the whole test suite
with SANITIZE=address,undefined and it turned this up, as well as a
similar case for OE_DELTA_SIZE_BITS.

> +	uint32_t size_:OE_SIZE_BITS;
> +	uint32_t size_valid:1;

A uint32_t bitfield? Would it make more sense to just call these
"unsigned", since we're specifying the precision already?

> +unsigned long oe_get_size_slow(struct packing_data *pack,
> +			       const struct object_entry *e);
> +static inline unsigned long oe_size(struct packing_data *pack,
> +				    const struct object_entry *e)
> +{
> +	if (e->size_valid)
> +		return e->size_;
> +
> +	return oe_get_size_slow(pack, e);
> +}

If oe_get_size_slow() fails to find an object's size, it dies. I'm
trying to think of whether that might hit funny corner cases with
racing. I don't _think_ so, because if the object truly goes away, we'd
be screwed during the writing phase anyway.

> +static inline int oe_size_less_than(struct packing_data *pack,
> +				    const struct object_entry *lhs,
> +				    unsigned long rhs)
> +{
> +	if (lhs->size_valid)
> +		return lhs->size_ < rhs;
> +	if (rhs < pack->oe_size_limit) /* rhs < 2^x <= lhs ? */
> +		return 0;
> +	return oe_get_size_slow(pack, lhs) < rhs;
> +}

Clever.

> +static inline void oe_set_size(struct packing_data *pack,
> +			       struct object_entry *e,
> +			       unsigned long size)
> +{
> +	if (size < pack->oe_size_limit) {
> +		e->size_ = size;
> +		e->size_valid = 1;
> +	} else {
> +		e->size_valid = 0;
> +		if (oe_get_size_slow(pack, e) != size)
> +			die("BUG: 'size' is supposed to be the object size!");
> +	}
> +}

That's an expensive assertion. But I guess this isn't supposed to happen
very frequently, so it's probably OK.

-Peff



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux