Re: [PATCH] diff-delta.c: pack the index structure

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I answered this post already, but the answer seems to have disappeared
into a black hole.  What a nuisance.

Nicolas Pitre <nico@xxxxxxx> writes:

> On Sat, 8 Sep 2007, David Kastrup wrote:
>
>> In normal use cases, the performance wins are not overly impressive:
>> we get something like 5-10% due to the slightly better locality of
>> memory accesses using the packed structure.
>
> The gain is probably counterbalanced by the fact that you're copying
> the whole index when packing it, which is unfortunate.

The alternative would be packing in-place.  That's actually a rather
contorted operation, and the gains from packing would then have to be
claimed by "realloc".  I think that realloc is more likely to leave a
fragmented address space (giving the reclaimed memory to the heap
rather than back to the system), but I'd have to check glibc for
details at least on GNU/Linux.

Another option avoiding the realloc would be not to use linked lists
at all but just collect all entries sequentually in "packed" form, and
sort them into order afterwards.  That's an O(n lg n) operation with a
large n.  Even if one has a sorting algorithm with good memory
locality, I doubt that the locality would compensate for the lg n
factor, even when taking into account that we save ourselves the
copying.  And that is not even taken the possibility of having to cull
some buckets into account, another O(n) operation (which amounts to
copying everything once, too).

> Also, could you provide actual test results backing your performance 
> claim?  5-10% is still not negligible.


dak@lola:/usr/local/tmp/git$ for i in .git/objects/[0-9a-f][0-9a-f]/[0-9a-f]*;do echo $i;done|sed 's+.*ts/\(..\)/+\1+' > /tmp/objlist
dak@lola:/usr/local/tmp/git$ time ./git-pack-objects </tmp/objlist --stdout|dd of=/dev/null
4099+2 records in
4100+1 records out
2099205 bytes (2.1 MB) copied, 4.01225 seconds, 523 kB/s

real	0m4.043s
user	0m3.836s
sys	0m0.164s
dak@lola:/usr/local/tmp/git$ time git-pack-objects </tmp/objlist --stdout|dd of=/dev/null
4099+2 records in
4100+1 records out
2099205 bytes (2.1 MB) copied, 4.34936 seconds, 483 kB/s

real	0m4.381s
user	0m4.056s
sys	0m0.176s
dak@lola:/usr/local/tmp/git$ 

Of course, on a warm file system.

>> -	struct index_entry *entry, **hash;
>> +	struct unpacked_index_entry *entry, **hash;
>> +	struct index_entry *aentry, **ahash;
>
> What does the "a" stand for?

array (rather than linked list).  Should I rather rename entry and
hash to uentry and uhash (for unpacked)?

>> +	mem = index+1;
> [...]
>> +	for (i=0; i<hsize; i++) {
> [...]
>> +		for (entry=hash[i]; entry; entry=entry->next)
>
> Minor style nit: please add spaces around "+", "=", "<", etc. for 
> consistency.

Will do.

-- 
David Kastrup, Kriemhildstr. 15, 44793 Bochum
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux