Re: [PATCH v2 18/24] pack-bitmap-write: build fewer intermediate bitmaps

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 11/24/2020 8:46 PM, Jonathan Tan wrote:
> I've reviewed the commit message already [1]; now let's look at the code.
> 
> [1] https://lore.kernel.org/git/20201124060738.762751-1-jonathantanmy@xxxxxxxxxx/
> 
>>  struct bb_commit {
>>  	struct commit_list *reverse_edges;
>> +	struct bitmap *commit_mask;
>>  	struct bitmap *bitmap;
>> -	unsigned selected:1;
>> +	unsigned selected:1,
>> +		 maximal:1;
> 
> The code itself probably should contain comments about the algorithm,
> but I'm not sure of the best way to do it. (E.g. I would say that
> "maximal" should be "When iteration in bitmap_builder_init() reaches
> this bb_commit, this is true iff none of its descendants has or will
> ever have the exact same commit_mask" - but then when do we explain why
> the commit_mask matters?)

Comments are tricky, as they are likely to go stale. In fact,
this algorithm changes dramatically later in this very series!

How much can we expect a reader to inspect the commit history to
discover the lengthy commit message? The message explains the
algorithm and its many subtleties versus reading comments that may
be too specific to this initial version.

At this point, "maximal" is a property that doesn't mean much
without inspecting the places where we set or check that bit.

>> +		if (c_ent->maximal) {
>> +			if (!c_ent->selected) {
>> +				bitmap_set(c_ent->commit_mask, num_maximal);
>> +				num_maximal++;
>> +			}
>> +
>> +			ALLOC_GROW(bb->commits, bb->commits_nr + 1, bb->commits_alloc);
>> +			bb->commits[bb->commits_nr++] = commit;
> 
> So the order of bit assignments in the commit_mask and the order of
> commits in bb->commits are not the same. In the commit_mask, bits are
> first assigned for selected commits and then the rest for commits we
> discover to be maximal. But in bb->commits, the order follows the
> topologically-sorted iteration. This is fine, but might be worth a
> comment (to add to the already big comment burden...)

This one I waffled on a bit. There _is_ a difference between the
bitmask order and this list order. This isn't a problem as long as
no one attempts to use the bitmasks to navigate into this list.

Further, it is entirely possible that the tests to never demonstrate
that difference, so pointing out may help a future developer from
making that mistake. Could be good to add this comment over the
bb->commits definition:

	/*
	 * 'commits' stores the list of maximal commits, in visited
	 * order. This can be different than the bitmask order for
	 * the selected commits.
	 */

>> +# To ensure the logic for "maximal commits" is exercised, make
>> +# the repository a bit more complicated.
>> +#
>> +#    other                         master
>> +#      *                             *
>> +# (99 commits)                  (99 commits)
>> +#      *                             *
>> +#      |\                           /|
>> +#      | * octo-other  octo-master * |
>> +#      |/|\_________  ____________/|\|
>> +#      | \          \/  __________/  |
>> +#      |  | ________/\ /             |
>> +#      *  |/          * merge-right  *
>> +#      | _|__________/ \____________ |
>> +#      |/ |                         \|
>> +# (l1) *  * merge-left               * (r1)
>> +#      | / \________________________ |
>> +#      |/                           \|
>> +# (l2) *                             * (r2)
>> +#       \____________...____________ |
> 
> What does the ... represent? If a certain number of commits, it would be
> clearer to write that there.

The ... are unnecessary and should be ___ instead. Thanks.
 
>> +#                                   \|
>> +#                                    * (base)
> 
> OK - some of the crosses are unclear, but from the bitmask given below,
> I know where the lines should go.
> 
>> +#
>> +# The important part for the maximal commit algorithm is how
>> +# the bitmasks are extended. Assuming starting bit positions
>> +# for master (bit 0) and other (bit 1), and some flexibility
>> +# in the order that merge bases are visited, the bitmasks at
>> +# the end should be:
>> +#
>> +#      master: 1       (maximal, selected)
>> +#       other: 01      (maximal, selected)
>> +# octo-master: 1
>> +#  octo-other: 01
>> +# merge-right: 111     (maximal)
>> +#        (l1): 111
>> +#        (r1): 111
>> +#  merge-left: 1101    (maximal)
>> +#        (l2): 11111   (maximal)
>> +#        (r2): 111101  (maximal)
>> +#      (base): 1111111 (maximal)
> 
> This makes sense. (l1) and (r1) are not maximal because everything that
> can reach merge-right can also reach them.
> 
> [snip]
> 
>>  test_expect_success 'full repack creates bitmaps' '
>> -	git repack -ad &&
>> +	GIT_TRACE2_EVENT_NESTING=4 GIT_TRACE2_EVENT="$(pwd)/trace" \
>> +		git repack -ad &&
>>  	ls .git/objects/pack/ | grep bitmap >output &&
>> -	test_line_count = 1 output
>> +	test_line_count = 1 output &&
>> +	grep "\"key\":\"num_selected_commits\",\"value\":\"106\"" trace &&
>> +	grep "\"key\":\"num_maximal_commits\",\"value\":\"111\"" trace
> 
> From the diagram and bit masks, I see that the important number for
> "maximal" is 7. Could this test be run twice - one without the crosses
> and one with, and we can verify that the difference between the maximal
> commits is 7? As it is, this 111 number is hard to verify.

This number _is_ hard to verify. It is fragile to many behaviors inside
the code of Git, including the selection algorithm and some hard-coded
limits (there's a reason we insert 100 commits on top of each side).
Further, this number changes as the algorithm is modified.

Perhaps the best way to recognize this number is that it changes by adding
5 to the previous number (these are the 5 "newly maximal" commits, since
two are already selected as tips).

This number changes again later, and the difference is justified by
the number of maximal commits dropping by 4.

Thanks,
-Stolee



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux