[RFD PATCH 0/3] Use "object index" rather than pointers in the object hashing

Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> · Mon, 16 Apr 2007 21:12:50 -0700 (PDT)

This is a series of three patches that changes the low-level object 
hashing to use a "object index" rather than the pointer to a "struct 
object" in the hash-tables. It's something I've been thinking about for a 
long time, so I just decided to do it.

The reason to do it is that on 64-bit architectures the object hash table 
is actually a fairly sizeable entity, and not for a very good reason. It 
has a ton of pointers to the objects we have allocated, so each hash-table 
entry is 64-bits, even though obviously we aren't likely to ever have that 
many objects.

So instead, we could use a 32-bit index into an object table - and in 
fact, since we already do all normal object allocations using a special 
dense allocatory that allocates 1024 objects in one go, we already kind of 
were set up for this, with the low 10 bits of the object index being a 
very natural index into each allocation block.

Could we ever want more than 4 billion objects? Unlikely, since you'd 
actually need 80GB of memory just to keep track of the object names in 
such a hash table, but hey, if that day ever comes, we can certainly 
trivially make the index be 64-bit instead (or more likely, make it be 
48-bit and use 16 bits of the hash table entry as an extended hash value 
or something).

Anyway, the before-and-after numbers are somewhat debatable, so this is 
purely a request for discussion..

Before:

	[torvalds@woody linux]$ /usr/bin/time git-rev-list --all --objects | wc -l
	5.66user 0.46system 0:06.12elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k
	0inputs+0outputs (0major+44389minor)pagefaults 0swaps
	445065

After:

	[torvalds@woody linux]$ /usr/bin/time ~/git/git-rev-list --all --objects | wc -l
	6.96user 0.36system 0:07.36elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
	0inputs+0outputs (0major+40240minor)pagefaults 0swaps
	445065

ie it's actually slightly slower, but it uses almost 10% less memory 
(minor page faults). Is it worth it? Probably not, but since I made the 
patches, I thought I'd post them anyway. And the two first patches are 
probably worth applying regardless - it's only the third patch that 
actually changes things to use a hash index.

Anyway, the three patches are:

	0001-Use-proper-object-allocators-for-unknown-object-node.patch
	0002-Clean-up-object-creation-to-use-more-common-code.patch
	0003-Make-the-object-lookup-hash-use-a-object-index-ins.patch

where 1-2 are pretty much just cleanups.

Comments?

		Linus
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html