On Fri, Oct 20, 2023 at 6:41 PM Junio C Hamano <gitster@xxxxxxxxx> wrote: > > Junio C Hamano <gitster@xxxxxxxxx> writes: > > > Rather, I was wondering if we need to use object flags to mark these > > objects, or can do what we want to do without using any object flags > > at all. For the purpose of reporting "missing" objects, wouldn't it > > be sufficient to walk the object graph and report our findings as we > > go? To avoid reporting the same object twice, as we reasonably can > > expect that the missing objects are minority (compared to the total > > number of objects), perhaps the codepath that makes such a report > > can use a hashmap of object_ids or something, for example. > > Digging from the bottom, > > * builtin/rev-list.c:show_commit() gets "struct rev_list_info *" > that has "struct rev_info *" [*]. > > * list-objects.c:do_traverse() calls revision.c:get_revision() to > obtain commits, some of which may be missing ones, and things > behind get_revision() are responsible for marking the commit as > missing. It has "struct traversal_context *", among whose > members is the "revs" member that is the "struct rev_info *". > > * revision.c:get_revision() and machinery behind it ultimately > discovers a missing commit in the revision.c:process_parents() > that loops over the parents commit_list. It of course has access > to "struct rev_info *". > > So, presumably, if we add a new member to "struct rev_info" that > optionally [*] points at an oidset that records the object names of > missing objects we discovered so far (i.e., the set of missing > objects), the location we set the MISSING bit of a commit can > instead add the object name of the commit to the set. And we can > export a function that takes "struct rev_info *" and "struct object > *" (or "struct object_id *") to check for membership in the "set of > missing objects", which would be used where we checked the MISSING > bit of a commit. > > I do not know the performance implications of going this route, but > if we do not find a suitable vacant bit, we do not have to use any > object flags bit to do this, if we go this route, I would think. I > may be missing some details that breaks the above outline, though. > > > [Footnotes] > > * A potential #leftoverbits tangent. > > Why is "rev_list_info" structure declared in <bisect.h>? I > suspect that this is a fallout from recent header file shuffling, > but given who uses it (among which is rev-list:show_commit() that > has very little to do with bisection and uses the information in > rev_list_info when doing its normal non-bisect things), it does > not make much sense. > > * When .do_not_die_on_missing_objects is false, it can and should > be left NULL, but presumably we use the "do not die" bit even > when we are not necessarily collecting the missing objects? So > the new member cannot replace the "do not die" bit completely. Thanks for the suggestion, this does seem like a good way to go ahead without using flags. The only performance issue being if there are too many commits which are missing, then oidset would be large. But I think that's okay though. > Thanks for researching. It sounds like it may be a better bit to > steal than the one used by the commit-graph, as long as there is no > reason to expect that blame may want to work in a corrupt repository > with missing objects, but when it happens, we may regret the > decision we are making here. > I don't see blame working with missing commits though, because it relies on parsing commits to get information to show to the user. So I think it's a safe bit to steal. Also, when the time comes we could always release the bit and move to the solution you mentioned above. Anyways on the whole I think keeping it future compatible makes a lot more sense. I'll send a patch series to implement an oidset instead of flags soon. - Karthik