Re: [PATCH 08/17] pack-objects: add --path-walk option

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 10/28/24 3:54 PM, Jonathan Tan wrote:
"Derrick Stolee via GitGitGadget" <gitgitgadget@xxxxxxxxx> writes:
This new walk is incompatible with some features and is ignored by
others:

  * Object filters are not currently integrated with the path-walk API,
    such as sparse-checkout or tree depth. A blobless packfile could be
    integrated easily, but that is deferred for later.

  * Server-focused features such as delta islands, shallow packs, and
    using a bitmap index are incompatible with the path-walk API.

  * The path walk API is only compatible with the --revs option, not
    taking object lists or pack lists over stdin. These alternative ways
    to specify the objects currently ignores the --path-walk option
    without even a warning.

It might be better to declare --path-walk as "internal use only" and
only supporting what send-pack.c (used by "git push") and "git repack"
needs. (From this list, it seems that there is a lot of incompatibility,
some of which can happen without a warning to the user, so it sounds
better to be up-front and say that we only support what send-pack.c
needs. This also makes reviewing easier, as we don't have to think about
the possible interactions with every other rev-list feature - only what
is used by send-pack.c.)

I do wonder what the value of doing this would be. I consider 'gitpack-objects' to already be a plumbing command, so marking any option
as "internal use only" seems like overkill. It takes effort to combine
the options carefully for the right effect. The tests in p5313 are not
terribly simple, such as needing --no-reuse-delta to guarantee we are
using the desired delta algorithm.

Also from a reviewer perspective, it might be better to restrict this
patch set to what send-pack.c needs and leave "git repack" for a future
patch set. This means that we would not need features such as blob
and tree exclusions, and possibly not even bitmap use or delta reuse
(assuming that the user would typically push recently-created objects
that have not been repacked).

While I can understand that as being a potential place to split the
patch series, the integration to add 'git repack --path-walk' is actually
very simple. Repacking "everything" needs to happen to be able to push a
repo to an empty remote, after all.

There are some subtleties around indexed objects, reflogs, and the like
that add some complexity, but they also are handled in the path-walk API
layer. Some of that complexity was helpful to know about during repack
tests.

Finally, the 'git repack --path-walk' use case is a great one for
demonstrating the benefits to threading the 'git pack-objects
--path-walk' algorithm.

+		/* Skip objects that do not exist locally. */
+		if (exclude_promisor_objects &&
+		    oid_object_info_extended(the_repository, oid, &oi,
+					     OBJECT_INFO_FOR_PREFETCH) < 0)
+			continue;

This functionality is typically triggered by --missing=allow;
--exclude_promisor_objects means (among other things) that we allow
a missing link only if that object is known to be a promisor object
(because another promisor object refers to it) (see Documentation/
rev-list-options.txt, and also get_reference() and elsewhere in
revision.c - notice how is_promisor_object() is paired with it)

Having said that, we should probably just fail outright on missing
objects, whether or not we have exclude_promisor_objects. If we have
computed that we need to push an object, that object needs to exist.
(Same for repack.)

I think that this is not a reasonable assumption that a hard fail
should be expected. Someone could create a blobless clone with a
sparse-checkout and then add a new file outside their sparse-checkout
without ever having the previous version downloaded.

When pushing, they won't be able to use the previous version as a
delta base, but they would certainly be confused about an object
being downloaded during 'git push'.

While the example I bring up is somewhat contrived, I can easily
imagine cases where missing objects are part of the commit boundary
and could be marked as UNINTERESTING but would still be sent in the
batch to be considered as a delta base.

Thanks,
-Stolee





[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux