Re: Calculating tree nodes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 9/4/07, Jon Smirl <jonsmirl@xxxxxxxxx> wrote:
> On 9/4/07, David Tweed <david.tweed@xxxxxxxxx> wrote:
> > On 9/4/07, Jon Smirl <jonsmirl@xxxxxxxxx> wrote:
> > > Git has picked up the hierarchical storage scheme since it was built
> > > on a hierarchical file system.
> >
> > FWIW my memory is that initial git used path-to-blob lists (as you're
> > describing but without delta-ing) and tree nodes were added after a
> > couple of weeks, the motivation _at the time_ being they were a
> > natural way to dramatically reduce the size of repos.
> >
> > One of the nice things about tree nodes is that for doing a diff
> > between versions you can, to overwhelming probability, decide
> > equality/inequality of two arbitrarily deep and complicated subtrees
> > by comparing 40 characters, regardless of how remote and convoluted
> > their common ancestry. With delta chains don't you end up having to
> > trace back to a common "entry" in the history? (Of course, I don't
> > know how packs affect this - presumably there's some delta chasing to
> > get to the bare objects as well.)
>
> While it is a 40 character compare, how many disk accesses were needed
> to get those two SHAs into memory?

That's a difficult question. Clearly if you're starting on a machine
without anything cached, you're probably a bit worse off. If not and
you've got a "wide, shallow tree where changes occur clustered within
directories rather than spread uniformly throughout the tree" then
it's likely already there.

But it's all I suspect it should be looked at in relative terms: with
the respect to the delta-d list of SHA's, do you need to do more or
less disk reads to be able to compare stuff? Dunno: I'd imagine this
depends on precisely what's delta'd against what and how many of those
you need to read in order to put two entries into a comparable form.
The key point I was making was not so much the 40 characters as that
(at least with loose objects) if you can are given the top-level
SHA's, you can efficiently decide equality (a key to efficient
diffing) _without having to care how those two are related in the
history or read any extra history_.

-- 
cheers, dave tweed__________________________
david.tweed@xxxxxxxxx
Rm 124, School of Systems Engineering, University of Reading.
"we had no idea that when we added templates we were adding a Turing-
complete compile-time language." -- C++ standardisation committee
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux