Re: Git's database structure

Mike Hommey <mh@xxxxxxxxxxxx> · Tue, 4 Sep 2007 20:04:29 +0200

On Tue, Sep 04, 2007 at 01:44:47PM -0400, Jon Smirl <jonsmirl@xxxxxxxxx> wrote:
> On 9/4/07, Junio C Hamano <gitster@xxxxxxxxx> wrote:
> > "Jon Smirl" <jonsmirl@xxxxxxxxx> writes:
> >
> > > Another way of looking at the problem,
> > >
> > > Let's build a full-text index for git. You put a string into the index
> > > and it returns the SHAs of all the file nodes that contain the string.
> > > How do I recover the path names of these SHAs?
> >
> > That question does not make much sense without specifying "which
> > commit's path you are talking about".
> >
> > If you want to encode such "contextual information" in addition
> > to "contents", you could do so, but you essentially need to
> > record commit + pathname + mode bits + contents as "blob" and
> > hash that to come up with a name.
> 
> I left the details out of the full-text example to make it more
> obvious that we can't recover the path names.
> 
> Doing this type of analysis may point out that even more fields are
> missing from the blob table such as commit id.
> 
> The current data store design is not very flexible. Databases solved
> the flexibility problem long ago. I'm just wondering if we should
> steal some good ideas out of the database world and apply them to git.
> Ten years from now we may have 100GB git databases and really wish we
> had more flexible ways of querying them.
> 
> The reason databases don't encode the fields into the index is that
> you can only have a single index on the table if you do that.
> Databases do sometimes duplicate the field in both the index and the
> table. Databases also have the property that indexes are just a cache
> and can be dropped at any time.

The big difference between a database and git is that a database is a
general purpose tool. git has a much more restricted scope. As such, it
doesn't need *that much* flexibility.

Mike
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html