On Thu, 5 Sep 2013, Junio C Hamano wrote: > Nicolas Pitre <nico@xxxxxxxxxxx> writes: > > > This goes as follows: > > > > - Tree reference: either variable length encoding of the index > > into the SHA1 table or the literal SHA1 prefixed by 0 (see > > encode_sha1ref()). > > > > - Parent count: variable length encoding of the number of parents. > > This is normally going to occupy a single byte but doesn't have to. > > > > - List of parent references: a list of encode_sha1ref() encoded > > references, or nothing if the parent count was zero. > > > > - Author reference: variable length encoding of an index into the author > > identifier dictionary table which also covers the time zone. To make > > the overall encoding efficient, the author table is sorted by usage > > frequency so the most used names are first and require the shortest > > index encoding. > > > > - Author time stamp: variable length encoded. Year 2038 ready! > > > > - Committer reference: same as author reference. > > > > - Committer time stamp: same as author time stamp. > > > > The remainder of the canonical commit object content is then zlib > > compressed and appended to the above. > > > > Rationale: The most important commit object data is densely encoded while > > requiring no zlib inflate processing on access, and all SHA1 references > > are most likely to be direct indices into the pack index file requiring > > no SHA1 search into the pack index file. > > May I suggest a small change to the above, though. > > Reorder the entries so that Parent count, list of parents and > committer time stamp come first in this order, and then the rest. > > That way, commit.c::parse_commit() could populate its field lazily > with parsing only the very minimum set of fields, and then the > revision walker, revision.c::add_parents_to_list(), can find where > in the priority queue to add the parents to the list of commits to > be processed while still keeping the object partially parsed. If a > commit is UNINTERESTING, no further parsing needs to be done. OK. If I understand correctly, the committer time stamp is more important than the author's, right? Because my latest change in the format was to make the former as a difference against the later and that would obviously have to be reversed. Also, to keep some kind of estetic symetry (if such thing may exist in a raw byte format) may I suggest keeping the tree reference first. That is easy to skip over if you don't need it, something like: if (!*ptr) ptr += 1 + 20; else while (*ptr++ & 128); Whereas, for a checkout where only the tree info is needed, if it is located after the list of parents, then the above needs to be done for all those parents and the committer time. Nicolas -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html