Re: [PATCH] diff-delta.c: pack the index structure

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Nicolas Pitre <nico@xxxxxxx> writes:

> On Sat, 8 Sep 2007, David Kastrup wrote:
>
>> In normal use cases, the performance wins are not overly impressive:
>> we get something like 5-10% due to the slightly better locality of
>> memory accesses using the packed structure.
>
> The gain is probably counterbalanced by the fact that you're copying
> the whole index when packing it, which is unfortunate.

It was a design choice (I don't particularly like it myself).  An
index is created once, used a dozen times.  Doing the packing in-place
implies using realloc on the finished index.  I consider the
likelihood of permanent memory fragmentation higher when doing that
rather than when allocating a fresh area of the same size.

Also a repack in-place is going to cost more operations and have quite
more complicated code.

> Also, could you provide actual test results backing your performance 
> claim?  5-10% is still not negligible.

I did in my git repository

for i in .git/objects/[0-9a-f][0-9a-f]/[0-9a-f]*;do echo $i;done|sed 's+.*ts/\(..\)/+\1+' > /tmp/objlist

and then something like

dak@lola:/usr/local/tmp/git$ time ./git-pack-objects </tmp/objlist --stdout|dd of=/dev/null
4099+2 records in
4100+1 records out
2099205 bytes (2.1 MB) copied, 3.99295 seconds, 526 kB/s

real	0m4.023s
user	0m3.812s
sys	0m0.168s
dak@lola:/usr/local/tmp/git$ time git-pack-objects </tmp/objlist --stdout|dd of=/dev/null
4099+2 records in
4100+1 records out
2099205 bytes (2.1 MB) copied, 4.18734 seconds, 501 kB/s

real	0m4.218s
user	0m4.012s
sys	0m0.160s
dak@lola:/usr/local/tmp/git$ 

repeatedly on a warm cache.  Results were pretty much comparable:
consistently my version was faster, but never more than 10%.

>> -	struct index_entry *entry, **hash;
>> +	struct unpacked_index_entry *entry, **hash;
>> +	struct index_entry *aentry, **ahash;
>
> What does the "a" stand for?

array (as opposed to linked list).  Was the first thing coming into my
mind.  Maybe I should have gone for "p" for packed, but I shied from
it because it often is meant to imply "pointer".

Alternatively I could replace "entry" with "uentry", but that affects
more lines.

What do you propose?

>> +	mem = index+1;
> [...]
>> +	for (i=0; i<hsize; i++) {
> [...]
>> +		for (entry=hash[i]; entry; entry=entry->next)
>
> Minor style nit: please add spaces around "+", "=", "<", etc. for 
> consistency.

Can do.

-- 
David Kastrup, Kriemhildstr. 15, 44793 Bochum
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux