Junio C Hamano <gitster@xxxxxxxxx> writes: > Nguyễn Thái Ngọc Duy <pclouds@xxxxxxxxx> writes: > >> The main reason to group objects by type is to make it possible to >> create another sha1->something mapping for a particular object type, >> without wasting space for storing sha-1 keys again. For example, we >> can store commit caches, tree caches... at the end of the index as >> extensions. > > Why can't you do the same with a single "sorted by SHA-1" table? > > Not impressed yet. The above should be "Not impressed yet, as it lacks sufficient explanation of possible future benefits, but the idea is interesting." For example, the reachability bitmap would want to say something like "Traversing from commit A, these objects in this pack are reachable." The bitmap for one commit A would logically consist of N bits for a packfile that stores N objects (the resulting bitmap needs to be compressed before going to disk, perhaps with RLE or something). With the single "sorted by SHA-1" table, we can use the index in that single table to enumerate all reachable objects of any type in one go. With four separate tables, on the other hand, we would need four bitmaps per commit. Either way is _possible_, but I think the former is simpler, and the latter makes it harder to introduce new types of objects in the future, which I do not think we have examined possible use cases well enough to make that decision to say "four types is enough forever". In either way, we would have such bitmap (or a set of four bitmaps in your case) for more than one commit (it is not necessary or desirable to add the reachability bitmap to all commits), and such a "reachability extension" would need to store a sequence of "the commit object name the bitmap (or a set of four bitmaps) is about, and the bitmap (or set of four bitmaps)". That object name does not have to be 20-byte but would be a varint representation of the offset into the "sorted by SHA-1" table. That varint representation would be smaller by about 3.5 bits if you have a separate "commit only, sorted by SHA-1" table (as the number of all objects tend to be 10x larger than the number of all commits that need them). For the particular case of "we want to only annotate the commits, never other kinds of objects" use case, it would be a win. But without knowing what other use cases we will want to use the "object annotation in the pack index file" mechanism for, it feels like a premature optimization to me to have four tables to shave 3.5 bits per object. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html