Re: heads-up: git-index-pack in "next" is broken

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, 17 Oct 2006, Linus Torvalds wrote:

> 
> 
> On Tue, 17 Oct 2006, Nicolas Pitre wrote:
> > 
> > Because offsets into packs are expressed as unsigned long everywhere 
> > else (except in the current pack index on-disk format).
> 
> Until your work, that "unsigned long" was totally just an internal thing 
> that didn't actually bleed into anything else.

And would you please explain how my work changes that state of affairs?
Sorry but I don't follow you here.  Still _I_ wrote that code.

> > > For some structure like this, it sounds positively wrong. Pack-files 
> > > should be architecture-neutral, which means that they shouldn't depend on 
> > > word-size, and they should be in some neutral byte-order.
> > 
> > But they do.  Please consider this code:
> 
> Right. The pack-file itself. But the code that actually _generates_ it 
> mixes things in alarming ways.

???

> > > In contrast, the new union introduced in "next" is just horrid. There's 
> > > not even any way to know which member to use, except apparently that it 
> > > expects that a SHA1 is never zero in the last 12 bytes. Which is probably 
> > > true, but still - that's some ugly stuff.
> > 
> > This union should be looked at just like a sortable hash pointing to a 
> > base object so that deltas with the same base object can be sorted 
> > together.
> 
> .. and it sorts _differently_ on a big-endian vs little-endian thing, 
> doesn't it?

Sure.  But who cares?  The sorting is just there to 1) perform binary 
searches on the list of deltas based from a given object, and 2) find a 
list of all deltas with the same base object.

> So now the sort order depends on endianness and/or wordsize. That just 
> sounds really really wrong.

Again, who cares?  That ordering doesn't influence any data produced by 
the tool.  It is an internal and private strategy to speed up the 
_local_ _searching_ process.  It could be replaced by a dumb linear 
list walk if you wish and the end result i.e. 
the produced pack index would be exactly the same (with a significant 
slowdown notwitstanding).

So let me summarize:

 - the union is a hash.

 - the hash is either an offset value or a sha1 digest.

 - this hash is used for fast object lookup _only_.

 - it does sort differently on big vs little endian machines.

 - but we don't care at all because

 - it is a private algorithmic thing that doesn't "bleed" into any 
   _real_ data structure, and

 - it doesn't have any influence on the format of the end result.

 - it is only a runtime abstraction and nothing else.

 - It never gets into the pack nor the pack index themselves.

Do you still have issues with that?


Nicolas
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]