Re: [PATCH 10/38] pack v4: commit object encoding

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, 6 Sep 2013, Junio C Hamano wrote:

> Nicolas Pitre <nico@xxxxxxxxxxx> writes:
> 
> > OK.  If I understand correctly, the committer time stamp is more 
> > important than the author's, right?
> 
> Yeah, it matters a lot more when doing timestamp based traversal
> without the reachability bitmaps.
> 
> > ... may I suggest keeping the tree reference first.  That 
> > is easy to skip over if you don't need it,...
> > ... Whereas, for a checkout where only the tree info is needed, if it is 
> > located after the list of parents, then the above needs to be done for 
> > all those parents and the committer time.
> 
> Hmm.  I wonder if that is a really good trade-off.
> 
> "checkout" is to parse a single commit object and grab the "tree"
> field, while "log" is to parse millions of commit objects to grab
> their "parents" and "committer timestamp" fields ("log path/spec"
> needs to grab "tree", too, so that does not make "tree" extremely
> uncommon compared to the other two fields, though).
> 
> I dunno.

I've therefore settled in the middle.  The patch description now looks 
like:

|    This goes as follows:
|
|    - Tree reference: either variable length encoding of the index
|      into the SHA1 table or the literal SHA1 prefixed by 0 (see
|      encode_sha1ref()).
|
|    - Parent count: variable length encoding of the number of parents.
|      This is normally going to occupy a single byte but doesn't have to.
|
|    - List of parent references: a list of encode_sha1ref() encoded
|      references, or nothing if the parent count was zero.
|
|    - Committer time stamp: variable length encoded.  Year 2038 ready!
|      Unlike the canonical representation, this is stored close to the
|      list of parents so the important data for history traversal can be
|      retrieved without parsing the rest of the object.
|
|    - Committer reference: variable length encoding of an index into the
|      ident dictionary table which also covers the time zone.  To make
|      the overall encoding efficient, the ident table is sorted by usage
|      frequency so the most used entries are first and require the shortest
|      index encoding.
|
|    - Author time stamp: encoded as a difference against the committer
|      time stamp, with the LSB used to indicate commit time is behind
|      author time.
|
|    - Author reference: same as committer reference.
|
|    The remainder of the canonical commit object content is then zlib
|    compressed and appended to the above.

I also updated the documentation patch accordingly in my tree.


Nicolas
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]