Special note and links ====================== This v3 is sent in reply to the v1 as the v2 was mistakenly sent to an unrelated patch series. v1: https://lore.kernel.org/git/20240418184043.2900955-1-christian.couder@xxxxxxxxx/ v2: https://lore.kernel.org/git/20240515132543.851987-1-christian.couder@xxxxxxxxx/ Rationale ========= `git pack-objects` already supports a `--missing=<missing-action>` option, so that it can avoid erroring out if some objects aren't available. It is interesting to have `git upload-pack` support a similar way to avoid sending some objects in case they aren't available on the server. For example, in case both the server and the client are using a separate promisor remote that contain some objects, it can be better if the server doesn't try to send such objects back to the client, but instead let the client get those objects separately from the promisor remote. (The client needs to have the separate promisor remote configured, for that to work.) This could work better if there was something, like perhaps a capability, for the client to tell the server something like: "I know how to fetch missing objects from this and that promisor remotes, so if you choose to, you may omit objects that you know are available from these promisor remotes when sending objects to me." But that capability could be added later as other similar capabilities in this area could be very useful. For example in case of a client cloning, something like the following is currently needed: GIT_NO_LAZY_FETCH=0 git clone -c remote.my_promisor.promisor=true \ -c remote.my_promisor.fetch="+refs/heads/*:refs/remotes/my_promisor/*" \ -c remote.my_promisor.url=<MY_PROMISOR_URL> \ --filter="blob:limit=5k" server But it would be nice if there was a capability for the client to say that it would like the server to give it information about the promisor that it could use, so that the user doesn't have to pass all the "remote.my_promisor.XXX" config options on the command like. (It would then be a bit similar to the bundle-uri feature where all the bundle related information comes from the server.) Another example use of this feature could be a server where some objects have been corrupted or deleted. It could still be useful for clients who could get those objects from another source, like perhaps a different client, to be able to fetch or clone from the server. The fact that the new `uploadpack.missingAction` configuration variable has to be set to a non default value on the server means that regular client users cannot hurt themselves with this feature. As `git rev-list` also supports a `--missing=<missing-action>` option, the first 2 patches in this series are about refactoring related code from both `git rev-list` and `git pack-objects` into new "missing.{c,h}" files. Patch 3/3 then adds a new `uploadpack.missingAction` configuration variable. Changes between v2 and v3 ========================= The changes since v2 are the following: - In patch 2/3, the show_object_fn_from_action() function was replaced by using a `static show_object_fn const fn[]`. - In patch 2/3, a new parse_missing_action_value_for_packing() function was introduced in the missing action API to simplify pack-object's code. - In patch 3/3, the new parse_missing_action_value_for_packing() function is used too. - In patch 3/3, the commit message has been improved to make it more assertive and clarify how the new feature can be used in the context of promisor remotes. - In patch 3/3, `TEST_PASSES_SANITIZE_LEAK=true` was removed in 't/t5706-upload-pack-missing.sh' as leak tests don't pass otherwise. Leaks seems to be in existing config and promisor related code. - In patch 3/3, `sed -n "s/^\?\(.*\)/\1/p"` was replaced with `perl -ne 'print if s/^[?]//'` in 't/t5706-upload-pack-missing.sh' as the former doesn't seem to work in our "linux-musl (alpine)" tests. Thanks to Junio for his reviews of v1 and v3, and for suggesting the most of the above changes. Range diff between v2 and v3 ============================ (Might not be super useful as changes in patch 2/3 are not seen.) 1: 0a961dd4f5 = 1: 67c761b08a rev-list: refactor --missing=<missing-action> 2: 410acc6a39 < -: ---------- pack-objects: use the missing action API -: ---------- > 2: 7bf04f3096 pack-objects: use the missing action API 3: 0f5efb064b ! 3: bac909a070 upload-pack: allow configuring a missing-action @@ Metadata ## Commit message ## upload-pack: allow configuring a missing-action - In case some objects are missing from a server, it might still be + In case some objects are missing from a server, it is sometimes useful to be able to fetch or clone from it if the client already has the missing objects or can get them in some way. - For example, in case both the server and the client are using a - separate promisor remote that contain some objects, it can be better - if the server doesn't try to send such objects back to the client, but - instead let the client get those objects separately from the promisor - remote. (The client needs to have the separate promisor remote - configured, for that to work.) + Suppose repository S borrows from its "promisor" X, and repository C + which initially cloned from S borrows from its "promisor" S. If C + wants an object in order to fill in the gap in its object graph, and + S does not have it (as S itself has no need for that object), then it + makes sense to let C go directly to X bypassing S. Another example could be a server where some objects have been corrupted or deleted. It could still be useful for clients who could @@ Documentation/config/uploadpack.txt: uploadpack.allowRefInWant:: + still get them from somewhere else. ## missing.c ## -@@ missing.c: int parse_missing_action_value(const char *value) - - return -1; +@@ missing.c: int parse_missing_action_value_for_packing(const char *value) + return -2 - res; + } } + +const char *missing_action_to_string(enum missing_action action) @@ missing.c: int parse_missing_action_value(const char *value) +} ## missing.h ## -@@ missing.h: enum missing_action { - */ - int parse_missing_action_value(const char *value); +@@ missing.h: int parse_missing_action_value(const char *value); + */ + int parse_missing_action_value_for_packing(const char *value); ++/* Return a short string literal describing the action. */ +const char *missing_action_to_string(enum missing_action action); + #endif /* MISSING_H */ @@ t/t5706-upload-pack-missing.sh (new) + +test_description='handling of missing objects in upload-pack' + -+TEST_PASSES_SANITIZE_LEAK=true +. ./test-lib.sh + +# Setup the repository with three commits, this way HEAD is always @@ t/t5706-upload-pack-missing.sh (new) + +check_missing_objects () { + git -C "$1" rev-list --objects --all --missing=print > all.txt && -+ sed -n "s/^\?\(.*\)/\1/p" <all.txt >missing.txt && ++ perl -ne 'print if s/^[?]//' all.txt >missing.txt && + test_line_count = "$2" missing.txt && + test "$3" = "$(cat missing.txt)" +} @@ upload-pack.c: static int upload_pack_config(const char *var, const char *value, } else if (!strcmp("transfer.advertisesid", var)) { data->advertise_sid = git_config_bool(var, value); + } else if (!strcmp("uploadpack.missingaction", var)) { -+ int res = parse_missing_action_value(value); -+ if (res < 0 || (res != MA_ERROR && -+ res != MA_ALLOW_ANY && -+ res != MA_ALLOW_PROMISOR)) ++ int res = parse_missing_action_value_for_packing(value); ++ if (res < 0) + die(_("invalid value for '%s': '%s'"), var, value); + /* Allow fetching only from promisor remotes */ + if (res == MA_ALLOW_PROMISOR) Christian Couder (3): rev-list: refactor --missing=<missing-action> pack-objects: use the missing action API upload-pack: allow configuring a missing-action Documentation/config/uploadpack.txt | 9 ++ Makefile | 1 + builtin/pack-objects.c | 37 ++++---- builtin/rev-list.c | 43 ++-------- missing.c | 53 ++++++++++++ missing.h | 28 +++++++ t/t5706-upload-pack-missing.sh | 125 ++++++++++++++++++++++++++++ upload-pack.c | 17 ++++ 8 files changed, 255 insertions(+), 58 deletions(-) create mode 100644 missing.c create mode 100644 missing.h create mode 100755 t/t5706-upload-pack-missing.sh -- 2.45.1.219.g07663fd880