Re: Calculating tree nodes

Daniel Hulme <st@xxxxxxxxx> · Tue, 4 Sep 2007 17:20:19 +0100

On Mon, Sep 03, 2007 at 11:26:30PM -0400, Jon Smirl wrote:
> This is something that has always bugged me about file systems. File
> systems force hierarchical naming due to their directory structure.
> There is no reason they have to work that way. Google is an example of
> a giant file system that works just fine without hierarchical
> directories. The full path should be just another attribute on the
> file. If you want a hierarchical index into the file system you can
> generate it by walking the files or using triggers. But you could also
> delete the hierarchical directory and replace it with something else
> like a full text index. Directories would become a computationally
> generated cache, not a critical part of the file system. But this is a
> git list so I shouldn't go too far off into file system design.

Am I the only one who thinks that this idea of moving filenames from
tree objects into blobs does the *opposite* of what you're trying to
achieve?

It seems, though I could be completely misinterpreting what you're
saying, that you want to be able to get rid of directories and replace
them with some other index into your files: maybe a full-text index,
maybe a spatial index for geographic data, maybe something else
entirely. As things stand, you could do that by editing the core to
introduce a new object type 'fulltext' whose contents maybe look like

aardvark <sha1 of a blob> <sha1 of another blob>
abacus <sha1>
...
zebra <yet another sha1> <maybe the same sha1 I mentioned before>

or even something hierarchical, with each index mapping from the first
letter of the index term to the sha1 of another index, which in term
maps second letters, and so on. Whatever. The point is, it works
parallel to tree. You could have the blobs referenced by your fulltext
object also be referenced by a tree object. If you really don't like
directory trees, you can dispense with tree objects in your repo
entirely. Either way you have a mapping from keys to blobs.

Then you could have your commits and tags include sha1's of fulltext
objects rather than (or as well as) tree objects, and you get your wish.

OTOH, imagine if you move filenames into the blobs. Now, no matter what
other index types you introduce, they'll always be secondary to the
traditional, path-and-filename method of finding files. Crucially, you
can't introduce new blobs into the repo without giving them filenames.

As you said in your other thread,

> Integrating indexing into the data is not normally done in a database.

But isn't this exactly what integrating filenames into blobs would do?

-- 
There is no such thing as a small specification change.
http://surreal.istic.org/            Forcing the lines through the snow.
Attachment:
signature.asc

Description: Digital signature