Jeff King <peff@xxxxxxxx> writes: > +int commit_metapack(unsigned char *sha1, > + uint32_t *timestamp, > + unsigned char **tree, > + unsigned char **parent1, > + unsigned char **parent2) > +{ > + struct commit_metapack *p; > + > + prepare_commit_metapacks(); > + for (p = commit_metapacks; p; p = p->next) { > + unsigned char *data; > + int pos = sha1_entry_pos(p->index, 20, 0, 0, p->nr, p->nr, sha1); This is a tangent, but isn't it about time to rip out the check for GIT_USE_LOOKUP in find_pack_entry_one(), I wonder. > + prepare_commit_metapacks(); > + for (p = commit_metapacks; p; p = p->next) { > + unsigned char *data; > + int pos = sha1_entry_pos(p->index, 20, 0, 0, p->nr, p->nr, sha1); > + if (pos < 0) > + continue; > + > + /* timestamp(4) + tree(20) + parents(40) */ > + data = p->data + 64 * pos; > + *timestamp = *(uint32_t *)data; > + *timestamp = ntohl(*timestamp); > + data += 4; > + *tree = data; > + data += 20; > + *parent1 = data; > + data += 20; > + *parent2 = data; > + > + return 0; > + } > + > + return -1; > +} I am torn on this one. These cached properties of a single commit will not change no matter which pack it appears in, and it feels logically wrong, especially when you record these object names in the full SHA-1 form, to tie a "commit metapack" to a pack. Logically there needs only one commit metapack that describes all the commits known to the repository when the metapack was created. In order to reduce the disk footprint and I/O cost, the future direction for this mechanism may want to point into an existing store of SHA-1 hashes with a shorter file offset, and the .idx file could be such a store, and in order to move in that direction, you cannot avoid tying a metapack to a pack. > +static void get_commits(struct metapack_writer *mw, > + const unsigned char *sha1, > + void *data) > +{ > + struct commit_list ***tail = data; > + enum object_type type = sha1_object_info(sha1, NULL); > + struct commit *c; > + > + if (type != OBJ_COMMIT) > + return; > + > + c = lookup_commit(sha1); > + if (!c || parse_commit(c)) > + die("unable to read commit %s", sha1_to_hex(sha1)); > + > + /* > + * Our fixed-size parent list cannot represent root commits, nor > + * octopus merges. Just skip those commits, as we can fallback > + * in those rare cases to reading the actual commit object. > + */ > + if (!c->parents || > + (c->parents && c->parents->next && c->parents->next->next)) > + return; > + > + *tail = &commit_list_insert(c, *tail)->next; > +} It feels somewhat wasteful to: - use commit_list for this, rather than an array of commit objects. If you have a rough estimate of the number of commits in the pack, you could just preallocate a single array and use ALLOC_GROW() on it, no? - iterate over the .idx file and run sha1_object_info() and parse_commit() on many objects in the SHA-1 order. Iterating in the way builtin/pack-objects.c::get_object_details() does avoids jumping around in existing packfiles, which may be more efficient, no? > +void commit_metapack_write(const char *idx) > +{ > + struct metapack_writer mw; > + struct commit_list *commits = NULL, *p; > + struct commit_list **tail = &commits; > + uint32_t nr = 0; > + > + metapack_writer_init(&mw, idx, "commits", 1); > + > + /* Figure out how many eligible commits we've got in this pack. */ > + metapack_writer_foreach(&mw, get_commits, &tail); > + for (p = commits; p; p = p->next) > + nr++; -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html