Re: [PATCH v4 4/5] MyFirstObjectWalk: fix description for counting omitted objects

Dirk Gouders <dirk@xxxxxxxxxxx> · Tue, 26 Mar 2024 21:09:08 +0100

Junio C Hamano <gitster@xxxxxxxxx> writes:

> Dirk Gouders <dirk@xxxxxxxxxxx> writes:
>
>> diff --git a/Documentation/MyFirstObjectWalk.txt b/Documentation/MyFirstObjectWalk.txt
>> index a06c712e46..6901561263 100644
>> --- a/Documentation/MyFirstObjectWalk.txt
>> +++ b/Documentation/MyFirstObjectWalk.txt
>> @@ -754,10 +754,12 @@ points to the same tree object as its grandparent.)
>>  === Counting Omitted Objects
>>  
>>  We also have the capability to enumerate all objects which were omitted by a
>> -filter, like with `git log --filter=<spec> --filter-print-omitted`. Asking
>> -`traverse_commit_list_filtered()` to populate the `omitted` list means that our
>> -object walk does not perform any better than an unfiltered object walk; all
>> -reachable objects are walked in order to populate the list.
>> +filter, like with `git log --filter=<spec> --filter-print-omitted`. To do this,
>> +change `traverse_commit_list()` to `traverse_commit_list_filtered()`, which is
>> +able to populate an `omitted` list.  This list of filtered objects may have
>> +performance implications, however, because despite filtering objects, the possibly
>> +much larger set of all reachable objects must be processed in order to
>> +populate that list.
>
> It may be just me not reading what is obvious to everybody else
> clearly, in which case I am happy to take the above text as-is, but
> the updated text that says a "list" may have "performance
> implications" reads a bit odd.  It would be understandable if you
> said "asking for list of filtered objects may have", though.

Oh yes, you are right (as far as I can say): I would change this to
something like:

"Asking for this list of filtered objects may cause performance
implications, however, because in this case, despite filtering objects,
the possibly much larger set of all reachable objects must be processed
in order to populate that list."

(Later in the document, it is suggested to do timing with the two
versions, which kind of follows up on the performance impact that is
focused on, here.  So, this doesn't remain an unresolved detail.)

> Are you contrasting a call to traverse_commit_list() and
> traverse_commit_list_filtered() and discussing their relative
> performance?  
>
> Of are you contrasting a call to traverse_commit_list_filtered()
> with and without the omitted parameter, and saying that a call with
> omitted parameter asks the machinery to do more work so it has to
> cost more?

This answer has the potential to cause an enhancement request, anyway:

Previously, the document didn't state that
traverse_commit_list_filtered() can be used without asking for a
`omitted` list (and I didn't change that), so the contrasting
in my understanding explicitely is traverse_commit_list()
vs. traverse_commit_list_filtered().

The second of your cases is only included implicitely, for those who
know or can guess they could use NULL as the pointer to `omitted` list.

Thank you for looking at this one more time!

Dirk

> Other than that I had no trouble with this latest round.
>
> Thanks.