Re: heads-up: git-index-pack in "next" is broken

Nicolas Pitre <nico@xxxxxxx> · Tue, 17 Oct 2006 17:21:39 -0400 (EDT)

On Tue, 17 Oct 2006, Linus Torvalds wrote:

> 
> 
> On Tue, 17 Oct 2006, Nicolas Pitre wrote:
> > On Tue, 17 Oct 2006, Sergey Vlasov wrote:
> > > 
> > > Yes, on x86_64 this is 24 because of 8-byte alignment for longs:
> > 
> > Ah bummer.  Then this is most likely the cause.  And here's a simple 
> > fix (Junio please confirm):
> 
> Why do you use "unsigned long" in the first place?

Because offsets into packs are expressed as unsigned long everywhere 
else (except in the current pack index on-disk format).

> For some structure like this, it sounds positively wrong. Pack-files 
> should be architecture-neutral, which means that they shouldn't depend on 
> word-size, and they should be in some neutral byte-order.

But they do.  Please consider this code:

	case OBJ_OFS_DELTA:
		memset(delta_base, 0, sizeof(*delta_base));
		c = pack_base[pos++];
		base_offset = c & 127;
		while (c & 128) {
			base_offset += 1;
			if (!base_offset || base_offset & ~(~0UL >> 7))
				bad_object(offset, "offset value overflow for delta base object");
			if (pos >= pack_limit)
				bad_object(offset, "object extends past end of pack");
			c = pack_base[pos++];
			base_offset = (base_offset << 7) + (c & 127);
		}
		delta_base->offset = offset - base_offset;
		if (delta_base->offset >= offset)
			bad_object(offset, "delta base offset is out of bound");
		break;

Do you see anything inerently wrong in this code?  The above is already 
64-bit ready such that it'll just work on 64-bit archs and will display 
a sensible message if a 32-bit arch encounter a pack larger than 4GB.  
But the on-disk pack format has no limitation what so ever.

> Quite frankly, this all makes me go "Eww..". The original pack-file (well, 
> v2) format was well-defined and had none of these issues. In contrast, the 
> new code in 'next' is just _ugly_.

I beg to differ.  Please reconsider in light of the above.

> And maybe it's just me, but I consider unions to be bug-prone on their 
> own. The "master" branch has exactly two unions: the "grep_expr" structure 
> contains one (where the union member is clearly defined by the node type 
> in that structure), and object.c has a "union any_object" that _literally_ 
> exists as purely an allocation size issue (ie it is used _only_ to 
> allocate the maximum size of any of the possible structures).
> 
> In contrast, the new union introduced in "next" is just horrid. There's 
> not even any way to know which member to use, except apparently that it 
> expects that a SHA1 is never zero in the last 12 bytes. Which is probably 
> true, but still - that's some ugly stuff.

This union should be looked at just like a sortable hash pointing to a 
base object so that deltas with the same base object can be sorted 
together.  And the field to use is well defined of course: deltas with 
sha1 to base use the sha1 member, deltas with offset to base use the 
offset member.  This hash, together with the delta type, constitute a 
tuple guaranteed to be unique so there can't be any confusion.

> Is this something you want to bet a big project on?

I don't see why not.

Nicolas
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html