On Wed, 28 Nov 2007, Nicolas Pitre wrote: > > Tree objects aren't all together. Related blob objects are interlaced > with those tree objects. Yeah, I noticed that a few minutes after saying this. > But for a checkout that should actually correspond to a nice linear > access. For the initial check-out, yes. But the thing I timed was just a plain "git checkout", which won't actually do any of the blobs if they already exist checked-out (which I obviously had), which explains the non-dense patterns. The reason I care about "git checkout" (which is totally uninteresting in itself) is that it is a trivial use-case that fairly closely approximates two common cases that are *not* uninteresting: switching branches with most files unaffected and a fast-forward merge (both of which are the "two-way merge" special case). I also suspect it is pretty close to a real three-way merge (again, with just a few files changed). IOW, there's a lot of these "tree operations" that actually leave 99% of the tree totally unchanged, at least in the kernel. Even a fairly big merge tends to change just a few hundred files. And when there are 23,000 files in the tree, a few hundred files is a fairly small percentage! So it's actually fairly common to have "git checkout"-like behaviour with no blobs needing to be updated, and the "initial checkout" is in fact likely a less usual case. I wonder if we should make the pack-file have all the object types in separate regions (we already do that for commits, since "git rev-list" kind of operations are dense in the commit). Making the tree objects dense (the same way the commit objects are) might also conceivably speed up "git blame" and path history simplification, since those also tend to be "dense" in the tree history but don't actually look at the blobs themselves until they change. Linus - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html