On 9/4/07, Martin Langhoff <martin.langhoff@xxxxxxxxx> wrote: > On 9/4/07, Jon Smirl <jonsmirl@xxxxxxxxx> wrote: > > > Yes. For performance reasons, since a simple commit would kill you in any > > > reasonably sized repo. > > > > That's not an obvious conclusion. A new commit is just a series of > > Hi Jon! > > If you search the archives you'll find Linus explaining that the > initial git had all the directory structure in one single "tree" > object that held all the paths, not matter how deep. The problem with > that was taht every commit generated a huge new tree object, so he > switched to the current "nested trees" structure, which also has the > nice feature of speeding up diffs/merges if whole subtrees haven't > changed. In my scheme the commit is only a list of SHA's. The paths are stored as attributes of the file objects. Commits are just edits to the list of SHA's in the commit objects. If these lists are kept sorted, then the delta should be tiny. Just the info on the adds/deletes to the list. This is very different that a single tree blob that contains all of the paths. Diffing two trees in the scheme is quite fast. Just get their commit objects into RAM and compare the lists of SHAs. > > edits to the previous commit. Start with the previous commit, edit it, > > delta it and store it. Storing of the file objects is the same. Why > > isn't this scheme fast than the current one? > > I think you're a bit confused between 2 different things: > > - git is _snapshot_ based, so every commit-tree-blob set is > completely independent. The "canonical" storage is each of those > gzipped in .git/objects > - however, for performance and on-disk-footprint, we delta them (very > efficiently I hear) The systems are essential the same with a little reorganization. In the current system the paths and SHA for a commit are spread over the tree nodes. In my scheme the path info is moved into the file object nodes and the SHA list is in the commit node. git still works exactly as it has before. I just moved things around in the storage system. The only thing that should be impacted is performance. > > So if you ask the GIT APIs about a tree, you end up dealing with the > nested trees I describe. Similarly, if you ask for a blob, you get the > blob. But internally git _is_ delta-compressing them. > > It's not compressing them immediately -- only when you run git gc. But > from an API perspective, you don't have to worry about that. > > HTH > > > martin > -- Jon Smirl jonsmirl@xxxxxxxxx - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html