Re: [RFC] Submodules in GIT

Linus Torvalds <torvalds@xxxxxxxx> · Sat, 2 Dec 2006 13:29:48 -0800 (PST)

On Sat, 2 Dec 2006, Martin Waitz wrote:
> 
> Do I understand you correctly that the problem is not the algorithmic
> complexity but that you have to map the objects at once instead of map
> them in small parts one after the other?

Not map them, but track their "used" flag. Yes. You can unmap objects any 
time at all (since you can just always re-create them at any time very 
easily and cheaply), but the one thing you CANNOT recreate is the object 
flags. See "struct object", and the "used" and FLAG_BITS in particular.

Almost all git programs need the FLAG_BITS. Something as simple as just 
traversing the commit history needs at a minimum one _single_ bit for each 
object: "Have I already seen this". In reality, you tend to need two or 
three more (ie the UNINTERESTING bit ends up being as important as the 
SEEN bit, because it's what determines whether it's reachable from some 
commit we're _not_ interested in, and in the end that's what allows us to 
not traverse the whole history).

So you need at a MINIMUM to track the bits

	#define SEEN            (1u<<0)
	#define UNINTERESTING   (1u<<1)

and in practice almost everything needs

	#define SHOWN           (1u<<3)

too (SEEN is for deciding whether to _traverse_ something, SHOWN is for 
deciding whether you've already output the data for this, and the 
difference is crucial for any depth-first DAG algorithm, since you need 
to test-and-set the one bit when you first encounter the object, and 
test-and-set the other bit when you "leave" the object).

So three bits are minimal to _any_ git traversal algorithm. Many specific 
issues want more bits (eg the TREECHANGE bit may not be quite as 
fundamnetal, but it sure ends up being critical for the "track subtree" 
case).

		Linus
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html