On Tue, Feb 16, 2021 at 06:46:59PM -0500, Jeff King wrote: > Do we use this partial traversal to impact the write order at all? That > would be a nice-to-have, but I suspect that just concatenating the packs > (presumably by descending mtime) ends up with a similar result. We don't; the objects are written in pack order. In the version of the patch you reviewed, the order of packs was determined by their hash (due to the string_list_sort()), but the version I just prepared re-sorts by mtime. It's kind of gross, since we need to use QSORT directly on the string_list internals in order to have access to the ->util field of the string_list_items (string_list_sort() only lets you compare strings directly for obvious reasons). I added a comment describing this hack. > > +--stdin-packs:: > > + Read the basenames of packfiles from the standard input, instead > > + of object names or revision arguments. The resulting pack > > + contains all objects listed in the included packs (those not > > + beginning with `^`), excluding any objects listed in the > > + excluded packs (beginning with `^`). > > ++ > > +Incompatible with `--revs`, or options that imply `--revs` (such as > > +`--all`), with the exception of `--unpacked`, which is compatible. > > I know you say "basename" here, but I wonder if it is worth giving an > example (`pack-1234abcd.pack`) to make it clear in what form we expect > it. Or possibly something in the `EXAMPLES` section. Good idea, thanks. > > --- a/builtin/pack-objects.c > > +++ b/builtin/pack-objects.c > > @@ -2979,6 +2979,164 @@ static int git_pack_config(const char *k, const char *v, void *cb) > > return git_default_config(k, v, cb); > > } > > > > +static int stdin_packs_found_nr; > > +static int stdin_packs_hints_nr; > > I scratched my head at these until I looked further in the code. They're > the counters for the trace output. Might be worth a brief comment above > them. (I do approve of adding this kind of trace debugging info; I'm > pretty accustomed to using gdb or adding one-off debug statements, but > we really could do a better job in general of making these kinds of > internals visible to mere mortal admins). Good call. > > +static int add_object_entry_from_pack(const struct object_id *oid, > > + struct packed_git *p, > > + uint32_t pos, > > + void *_data) > > +{ > > + struct rev_info *revs = _data; > > + struct object_info oi = OBJECT_INFO_INIT; > > + off_t ofs; > > + enum object_type type; > > + > > + display_progress(progress_state, ++nr_seen); > > + > > + ofs = nth_packed_object_offset(p, pos); > > + > > + oi.typep = &type; > > + if (packed_object_info(the_repository, p, ofs, &oi) < 0) > > + die(_("could not get type of object %s in pack %s"), > > + oid_to_hex(oid), p->pack_name); > > Calling out for other reviewers: the oi.typep field will be filled in > the with _real_ type of the object, even if it's a delta. This is as > opposed to the return value of packed_object_info(), which may be > OFS_DELTA or REF_DELTA. > > And that real type is what we want here: > > > + else if (type == OBJ_COMMIT) { > > + /* > > + * commits in included packs are used as starting points for the > > + * subsequent revision walk > > + */ > > + add_pending_oid(revs, NULL, oid, 0); > > + } > > And later when we call create_object_entry(). :-). Yes indeed. As I'm sure that you will recall, the pack-objects code _does not_ behave well when you give it the packed type of an object (which is not entirely unexpected, since the pack-objects code only operates on the true type, so passing the packed type--as I did when originally writing this patch--is a bug). > I wondered whether it would be worth adding other objects we might find, > like trees, in order to increase our traversal. But that doesn't make > any sense. The whole point is to find the paths, which come from > traversing from the root trees. And we can only find the root trees by > starting at commits. Adding any random tree we found would defeat the > purpose (most of them are sub-trees and would give us a useless partial > path). Right. > Should we avoid adding the commit as a tip for walking if it won't end > up in the resulting pack? I.e., should we check these: > > > + if (have_duplicate_entry(oid, 0)) > > + return 0; > > + > > + if (!want_object_in_pack(oid, 0, &p, &ofs)) > > + return 0; > > ...first? I guess it probably doesn't matter too much since we'd > truncate the traversal as soon as we saw it was in a kept pack anyway. I agree it doesn't make a difference, but I think placing the extra guards first makes it easier to read (since the reader doesn't have to consider how the subsequent traversal would treat it). > > +static void show_commit_pack_hint(struct commit *commit, void *_data) > > +{ > > +} > > Nothing to do here, since commits don't have a name field. Makes sense. Yeah. I added a comment to say the same thing, just for extra clarity. > > > +static void show_object_pack_hint(struct object *object, const char *name, > > + void *_data) > > +{ > > + struct object_entry *oe = packlist_find(&to_pack, &object->oid); > > + if (!oe) > > + return; > > + > > + /* > > + * Our 'to_pack' list was constructed by iterating all objects packed in > > + * included packs, and so doesn't have a non-zero hash field that you > > + * would typically pick up during a reachability traversal. > > + * > > + * Make a best-effort attempt to fill in the ->hash and ->no_try_delta > > + * here using a now in order to perhaps improve the delta selection > > + * process. > > + */ > > + oe->hash = pack_name_hash(name); > > + oe->no_try_delta = name && no_try_delta(name); > > + > > + stdin_packs_hints_nr++; > > +} > > But for actual objects, we do fill in the hash. I wonder if it's > possible for oe->hash to have been already filled. I don't think it > really matters, though. Any value we get is equally valid, so > overwriting is OK in that case. Right. > > + trace2_data_intmax("pack-objects", the_repository, "stdin_packs_found", > > + stdin_packs_found_nr); > > I wonder if it makes sense to report the actual set of packs via trace > (obviously not as an int, but as a list). That's less helpful for > debugging pack-objects, if you just fed it the input anyway, but if you > were debugging "git repack --geometric" it might be useful to see which > packs it thought were which (though arguably that would be a useful > trace in builtin/repack.c instead). I could see an argument in both ways. I'd rather pass for now until we have a clearer need for it. > [passing --unpacked to the namehash traversal] > > I'm OK to consider that an implementation detail for now, though. We can > change it later without impacting the interface. Agreed. > > + if (rev_list_unpacked) > > + add_unreachable_loose_objects(); > > Despite the name, that function is adding both reachable and unreachable > ones. So it is doing what you want. It might be worth renaming, but it's > not too big a deal since it's local to this file. Yeah, I tend to err on the side of "it's fine as-is" since this isn't exposed outside of pack-objects internals. If you feel strongly I'm happy to change it, but I suspect you don't. Thanks, Taylor