On Fri, 6 Sep 2013, Junio C Hamano wrote: > Nicolas Pitre <nico@xxxxxxxxxxx> writes: > > > OK. If I understand correctly, the committer time stamp is more > > important than the author's, right? > > Yeah, it matters a lot more when doing timestamp based traversal > without the reachability bitmaps. > > > ... may I suggest keeping the tree reference first. That > > is easy to skip over if you don't need it,... > > ... Whereas, for a checkout where only the tree info is needed, if it is > > located after the list of parents, then the above needs to be done for > > all those parents and the committer time. > > Hmm. I wonder if that is a really good trade-off. > > "checkout" is to parse a single commit object and grab the "tree" > field, while "log" is to parse millions of commit objects to grab > their "parents" and "committer timestamp" fields ("log path/spec" > needs to grab "tree", too, so that does not make "tree" extremely > uncommon compared to the other two fields, though). > > I dunno. I've therefore settled in the middle. The patch description now looks like: | This goes as follows: | | - Tree reference: either variable length encoding of the index | into the SHA1 table or the literal SHA1 prefixed by 0 (see | encode_sha1ref()). | | - Parent count: variable length encoding of the number of parents. | This is normally going to occupy a single byte but doesn't have to. | | - List of parent references: a list of encode_sha1ref() encoded | references, or nothing if the parent count was zero. | | - Committer time stamp: variable length encoded. Year 2038 ready! | Unlike the canonical representation, this is stored close to the | list of parents so the important data for history traversal can be | retrieved without parsing the rest of the object. | | - Committer reference: variable length encoding of an index into the | ident dictionary table which also covers the time zone. To make | the overall encoding efficient, the ident table is sorted by usage | frequency so the most used entries are first and require the shortest | index encoding. | | - Author time stamp: encoded as a difference against the committer | time stamp, with the LSB used to indicate commit time is behind | author time. | | - Author reference: same as committer reference. | | The remainder of the canonical commit object content is then zlib | compressed and appended to the above. I also updated the documentation patch accordingly in my tree. Nicolas -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html