On Tue, Sep 04, 2007 at 01:44:47PM -0400, Jon Smirl <jonsmirl@xxxxxxxxx> wrote: > On 9/4/07, Junio C Hamano <gitster@xxxxxxxxx> wrote: > > "Jon Smirl" <jonsmirl@xxxxxxxxx> writes: > > > > > Another way of looking at the problem, > > > > > > Let's build a full-text index for git. You put a string into the index > > > and it returns the SHAs of all the file nodes that contain the string. > > > How do I recover the path names of these SHAs? > > > > That question does not make much sense without specifying "which > > commit's path you are talking about". > > > > If you want to encode such "contextual information" in addition > > to "contents", you could do so, but you essentially need to > > record commit + pathname + mode bits + contents as "blob" and > > hash that to come up with a name. > > I left the details out of the full-text example to make it more > obvious that we can't recover the path names. > > Doing this type of analysis may point out that even more fields are > missing from the blob table such as commit id. > > The current data store design is not very flexible. Databases solved > the flexibility problem long ago. I'm just wondering if we should > steal some good ideas out of the database world and apply them to git. > Ten years from now we may have 100GB git databases and really wish we > had more flexible ways of querying them. > > The reason databases don't encode the fields into the index is that > you can only have a single index on the table if you do that. > Databases do sometimes duplicate the field in both the index and the > table. Databases also have the property that indexes are just a cache > and can be dropped at any time. The big difference between a database and git is that a database is a general purpose tool. git has a much more restricted scope. As such, it doesn't need *that much* flexibility. Mike - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html