DAG scalability (was: Git commit generation numbers)

Shawn Pearce <spearce@xxxxxxxxxxx> · Sun, 17 Jul 2011 15:18:19 -0700

On Sun, Jul 17, 2011 at 12:30, Linus Torvalds
<torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
> The thing is, the very basic design of git is all about *incomplete*
> DAG traversal. The DAG traversal part is pretty obvious and simple,
> but the *partial* thing really is very very important. We absolutely
> need it for reasonable scalability.
...
> That's a *really* fundamental design issue in git. Performance was
> always a primary goal. And by primary, I really mean primary. As in
> "more important than just about anything else".  There were other
> primary goals, but really not very many.
>
> And there really aren't very good ways to limit DAG traversal.

What about `git clone`?  We're always recomputing the entire DAG
during it. For a public repository like yours that only contains
public objects, its a horrible abuse of the servers that are serving
the repository...

Just saying, not everything we do winds up being a partial or
incomplete traversal in the name of performance. Sometimes we expend
1.5 minutes of CPU time *per request* on a busy server because we
don't want a cache, but then we're off in this bike shed painting
discussion about saving someone's desktop what 30 seconds of CPU time
via generation numbers in commits? Ugh. Cry me a river.

Maybe I only complain about the server side utilization of clone
because I run servers that have a lot of clone traffic.

-- 
Shawn.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html