Re: Zero padded file modes...

Nicolas Pitre <nico@xxxxxxxxxxx> · Thu, 05 Sep 2013 12:56:29 -0400 (EDT)

On Thu, 5 Sep 2013, Jeff King wrote:

> On Thu, Sep 05, 2013 at 11:18:24PM +0700, Nguyen Thai Ngoc Duy wrote:
> 
> > > There are basically two solutions:
> > >
> > >   1. Add a single-bit flag for "I am 0-padded in the real data". We
> > >      could probably even squeeze it into the same integer.
> > >
> > >   2. Have a "classic" section of the pack that stores the raw object
> > >      bytes. For objects which do not match our expectations, store them
> > >      raw instead of in v4 format. They will not get the benefit of v4
> > >      optimizations, but if they are the minority of objects, that will
> > >      only end up with a slight slow-down.
> > 
> > 3. Detect this situation and fall back to v2.
> > 
> > 4. Update v4 to allow storing raw tree entries mixing with v4-encoded
> > tree entries. This is something between (1) and (2)
> 
> I wouldn't want to do (3). At some point pack v4 may become the standard
> format, but there will be some repositories which will never be allowed
> to adopt it.
> 
> For (4), yes, that could work. But like (1), it only solves problems in
> tree entries. What happens if we have a quirky commit object that needs
> the same treatment (e.g., a timezone that does not fit into the commit
> name dictionary properly)?
> 
> > I think (4) fits better in v4 design and probably not hard to do. Nico
> > recently added a code to embed a tree entry inline, but the mode must
> > be encoded (and can't contain leading zeros). We could have another
> > code to store mode in ascii. This also makes me wonder if we might
> > have similar problems with timezones, which are also specially encoded
> > in v4..
> 
> Yeah, that might be more elegant.
> 
> > (3) is probably easiest. We need to scan through all tree entries
> > first when creating v4 anyway. If we detect any anomalies, just switch
> > back to v2 generation. The user will be force to rewrite history in
> > order to take full advantage of v4 (they can have a pack of weird
> > trees in v2 and the rest in v4 pack, but that's not optimal).
> 
> Splitting across two packs isn't great, though. What if v4 eventually
> becomes the normal on-the-wire format? I'd rather have some method for
> just embedding what are essentially v2 objects into the v4 pack, which
> would give us future room for handling these sorts of things.
> 
> But like I said, I haven't looked closely yet, so maybe there are
> complications with that. In the meantime, I'll defer to the judgement of
> people who know what they are talking about. :)

None of the above is particularly appealing to me.

Pack v4 has to enforce some standardization in the object encoding to be 
efficient.  Some compromizes have been applied to accommodate the fixing 
of a thin pack, although I was initially tempted to simply dodge the 
issue and allow thin packs in a repository.

On this particular mode issue, I remember making a fuss at the time when 
this was discovered because the github implementation did generate such 
tree objects at the time.

So instead of compromizing the pack v4 object encoding further, I'd 
simply suggest adding a special object type which is in fact simply the 
pack v2 representation i.e. the canonical object version, deflated.  
Right now pack v4 encodes only 5 object types: commit, tree, blob, delta 
and tag.  Only the commit and tree objects have their representation 
transcoded.  So that means we only need to add native_commit and 
native_tree object types.

Then, anything that doesn't fit the strict expectation for transcoding a 
tree or a commit object is simply included as is without transcoding 
just like in pack v2.

Nicolas
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html