Re: [PATCH 2/2] Implement a simple delta_base cache

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On Sat, 17 Mar 2007, Nicolas Pitre wrote:
> 
> Well... in my opinion it is the _current_ tree walker that is quite ugly 
> and complex.  It is always messier to parse strings than fixed width 
> binary fields.

Sure. On the other hand, text is what made things easy to do initially, 
and you're missing one *BIG* clue: you cannot remote the support without 
losing compatibility with all traditional object formats.

So you have no choice. You need to support the text representation. As a 
result, *your* code will now be way more ugly and messy.

The thing is, parsing some little text may sound expensive, but if the 
expense is in finding the end of the string, we're doing really well.

In other words: the data structures are both simple and straightforward, 
and the only reason strlen() shows up at all is:

 - we pass strings around as just C strings, even when we know their 
   lengths. Prime example: look at tree-diff.c. And when you look at it, 
   realize that *for*every*single*strlen* in that file except for the very 
   last one (which is only used once per process for setup) we actually 
   know the string length from before, but we (well, *I*) decided that it 
   wasn't worth passing down as a parameter all the time.

 - the simple parsing of the tree itself (which really isn't that 
   expensive - the real expense is bringing the data into the CPU cache, 
   but that's something we'd need to do *anyway*).

So I seriously suspect that you could get the strlen() overhead down from 
that 16% pretty easily, but you'd have to pass the length of the "base" 
string along all the time (and in the tree_entry cases you'd replace the 
"strlen()" calls with a call to something like

	static inline int tree_entry_len(const char *name, const unsigned char *sha1)
	{
		return (char *)sha1 - (char *)name - 1;
	}

which will do it for you).

But what you're ignoring here is that "16%" may sound like a huge deal, 
but it's 16% of somethng that takes 1 second, and that other SCM's cannot 
do AT ALL.

		Linus
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]