Re: [PATCH v4 3/6] pack-objects: add --sparse option

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 1/11/2019 5:30 PM, Junio C Hamano wrote:
"Derrick Stolee via GitGitGadget" <gitgitgadget@xxxxxxxxx> writes:

From: Derrick Stolee <dstolee@xxxxxxxxxxxxx>

Add a '--sparse' option flag to the pack-objects builtin. This
allows the user to specify that they want to use the new logic
for walking trees. This logic currently does not differ from the
existing output, but will in a later change.

Create a new test script, t5322-pack-objects-sparse.sh, to ensure
the object list that is selected matches what we expect. When we
update the logic to walk in a sparse fashion, the final test will
be updated to show the extra objects that are added.
Somehow checking the "these are exactly what we expect" feels
brittle.  In a history with three relevant commits A---B---C,
packing B..C could omit trees and blobs in C that appear in A but
not in B, but traditionally, because we stop traversal at B and do
not even look at A, we do not notice that such objects that need to
complete C's tree are already available in a repository that has B.
The approach of test in this patch feels similar to saying "we must
send these duplicates because we must stay dumb", even though with a
perfect knowledge about the history, e.g. bitmap, we would be able
to omit these objects in A that appear in C but not in B.
My intention with this test was to show that the existing algorithm also sends "extra" objects in certain situations, which is later contrasted by a test where the new algorithm sends objects the old algorithm did not.

Instead of "we must stay dumb" I wanted to say "we are not dumber (in this case)".

I understand your brittle feeling. If the logic changed in either to be smarter, then we would need to change the test. In some sense, that does mean we have extra coverage of "this is how it works now" so we would understand the behavior change of that hypothetical future change.

I think we want to test test both "are we sending enough, even
though we might be wasting some bandwidth by not noticing the other
side already have some?" and "are we still avoiding from becoming
overly stupid, even though we may be cheating to save traversal cost
and risking to redundantly send some objects?"

IOW, a set of tests to make sure that truly new objects are all sent
by seeing what is sent is a strict superset of what we expect.  And
another set of tests to make sure that objects that are so obviously
(this criterion may be highly subjective) be present in the
receiving repository (e.g. the tree object of commit B and what it
contains when seinding B..C) are never sent, even when using an
algorithm that are tuned for traversal cost over bandwidth
consumption.

To properly test "these objects are included," do we have an established pattern for saying "this file is a subset of that file"? Or, should I use something like `comm expected actual >common && test_cmp expected common`?

The code change in this step looks all trivially good, and it may
want to be squashed into the previous step to become a single patch.
Otherwise, [2/6] would be adding a dead code that nobody exercises
until this step.

Will squash.

Thanks,
-Stolee



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux