On 9/4/07, Jon Smirl <jonsmirl@xxxxxxxxx> wrote: > On 9/4/07, David Tweed <david.tweed@xxxxxxxxx> wrote: > > On 9/4/07, Jon Smirl <jonsmirl@xxxxxxxxx> wrote: > > > Git has picked up the hierarchical storage scheme since it was built > > > on a hierarchical file system. > > > > FWIW my memory is that initial git used path-to-blob lists (as you're > > describing but without delta-ing) and tree nodes were added after a > > couple of weeks, the motivation _at the time_ being they were a > > natural way to dramatically reduce the size of repos. > > > > One of the nice things about tree nodes is that for doing a diff > > between versions you can, to overwhelming probability, decide > > equality/inequality of two arbitrarily deep and complicated subtrees > > by comparing 40 characters, regardless of how remote and convoluted > > their common ancestry. With delta chains don't you end up having to > > trace back to a common "entry" in the history? (Of course, I don't > > know how packs affect this - presumably there's some delta chasing to > > get to the bare objects as well.) > > While it is a 40 character compare, how many disk accesses were needed > to get those two SHAs into memory? That's a difficult question. Clearly if you're starting on a machine without anything cached, you're probably a bit worse off. If not and you've got a "wide, shallow tree where changes occur clustered within directories rather than spread uniformly throughout the tree" then it's likely already there. But it's all I suspect it should be looked at in relative terms: with the respect to the delta-d list of SHA's, do you need to do more or less disk reads to be able to compare stuff? Dunno: I'd imagine this depends on precisely what's delta'd against what and how many of those you need to read in order to put two entries into a comparable form. The key point I was making was not so much the 40 characters as that (at least with loose objects) if you can are given the top-level SHA's, you can efficiently decide equality (a key to efficient diffing) _without having to care how those two are related in the history or read any extra history_. -- cheers, dave tweed__________________________ david.tweed@xxxxxxxxx Rm 124, School of Systems Engineering, University of Reading. "we had no idea that when we added templates we were adding a Turing- complete compile-time language." -- C++ standardisation committee - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html