Re: Git's database structure

Jeff King <peff@xxxxxxxx> · Tue, 4 Sep 2007 13:09:22 -0400

On Tue, Sep 04, 2007 at 12:19:33PM -0400, Jon Smirl wrote:

> By introducing tree nodes you have blended a specific indexing scheme
> into the data. There are many other ways the path names could be
> indexed hash tables, binary trees, etc.

That is correct. However, given that indexing scheme, many of the common
operations just "fall out" simply and efficiently, without the need to
keep separate indices. So yes, git is geared towards a particular set of
operations.

Your complaint seems to be two-fold:

 1. there is an inelegance in the blending of data and indexing. The
    problem with changing this is:
      a. we are all already using git, and it would require completely
         re-vamping the core data structure
      b. there is some feeling that the blending is necessary for
         performance. Given the difficulty of (a), I think you would
         have to provide compelling evidence (i.e., numbers) that a
         git-like system based around set theory with separate indices
         would perform as well.

 2. you want perform some operations to which the hierarchy is not
    well-suited. In this case, I think you can get by with the same
    solution you have proposed already: indices external to the data
    structure (in fact, this is exactly what Google is doing: taking
    hierarchical URLs and indexing them in different ways).

    Have you taken a look at the pack v4 work by Shawn and Nicolas? It
    is an attempt to build such indices at pack time (but keeping the
    core git data structure intact).

-Peff
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html