Hi Chris On 23/09/2022 19:55, Stefan Xenos via GitGitGadget wrote:
From: Stefan Xenos <sxenos@xxxxxxxxxx> metacommit.c supports the creation of metacommits and adds the API needed to create and update changes. Create the "modify_change" function that can be called from modification commands like "rebase" and "git amend" to record obsolescences in the change graph. Create the "record_metacommit" function for recording more complicated commit relationships in the commit graph. Create the "write_metacommit" function for low-level creation of metacommits.
The commit message fails to mention that we do not create a "parent-type" header when we create a metacommit but instead abuse the commit message.
As with the other patches there are a lot of style comments, but to try and limit the noise I've only commented on one instance of each - you should apply the comments to all occurrences in all patches.
I've left a couple of questions where I'm not sure exactly what the code is trying to do but apart from an easily fixed NULL pointer de-reference and not actually creating the "parent-type" header it looks pretty good. I was glad to see that there are no obvious memory leaks. I do think the patches in this series would be easier to follow if the function parameter names were nouns rather than verbs.
Signed-off-by: Stefan Xenos <sxenos@xxxxxxxxxx> Signed-off-by: Chris Poucet <poucet@xxxxxxxxxx> --- Makefile | 1 + metacommit.c | 404 +++++++++++++++++++++++++++++++++++++++++++++++++++ metacommit.h | 58 ++++++++ 3 files changed, 463 insertions(+) create mode 100644 metacommit.c create mode 100644 metacommit.h diff --git a/Makefile b/Makefile index 2b847e7e7de..68082ef94c7 100644 --- a/Makefile +++ b/Makefile @@ -1000,6 +1000,7 @@ LIB_OBJS += merge-ort.o LIB_OBJS += merge-ort-wrappers.o LIB_OBJS += merge-recursive.o LIB_OBJS += merge.o +LIB_OBJS += metacommit.o LIB_OBJS += metacommit-parser.o
I think the code to parse and create metacommits (as well as the change table code) could quite happily live in the same file.
diff --git a/metacommit.c b/metacommit.c new file mode 100644 index 00000000000..d2b859a4d3b --- /dev/null +++ b/metacommit.c @@ -0,0 +1,404 @@ +#include "cache.h" +#include "metacommit.h" +#include "commit.h" +#include "change-table.h" +#include "refs.h" + +void init_metacommit_data(struct metacommit_data *state) +{ + memset(state, 0, sizeof(*state)); +}
We'd normally use an initializer macro instead #define METACOMMIT_DATA_INIT = { 0 }
+void clear_metacommit_data(struct metacommit_data *state) +{ + oid_array_clear(&state->replace); + oid_array_clear(&state->origin); +} + +static void compute_default_change_name(struct commit *initial_commit, + struct strbuf* result) +{ + struct strbuf default_name;
The canonical way to initialize an strbuf that is not on the heap is struct strbuf buf = STRBUF_INIT;
+ const char *buffer; + const char *subject; + const char *eol; + int len; + strbuf_init(&default_name, 0); + buffer = get_commit_buffer(initial_commit, NULL); + find_commit_subject(buffer, &subject); + eol = strchrnul(subject, '\n'); + for (len = 0;subject < eol && len < 10; ++subject, ++len) {
There's a space missing after the first ';'. We prefer post-increments to pre-increments unless the pre-increment is significant.
+ char next = *subject; + if (isspace(next)) + continue; + + strbuf_addch(&default_name, next); + } + sanitize_refname_component(default_name.buf, result);
I suspect we need to call unuse_commit_buffer(initial_commit) here.
+} + +/** + * Computes a change name for a change rooted at the given initial commit. Good + * change names should be memorable, unique, and easy to type. They are not + * required to match the commit comment. + */ +static void compute_change_name(struct commit *initial_commit, struct strbuf* result) +{ + struct strbuf default_name; + struct object_id unused; + + strbuf_init(&default_name, 0); + if (initial_commit) + compute_default_change_name(initial_commit, &default_name); + else + strbuf_addstr(&default_name, "change");
What does it mean to call this function with initial_commit == NULL?
+ strbuf_addstr(result, "refs/metas/"); + strbuf_addbuf(result, &default_name); + /* If there is already a change of this name, append a suffix */ + if (!read_ref(result->buf, &unused)) { + int suffix = 2; + int original_length = result->len;
This is one of many places where we have a size_t len or nr member and assign it to an int. I think it would be clearer to use a size_t instead to avoid adding any more signed<->unsigned conversions.
+ + while (1) { + strbuf_addf(result, "%d", suffix); + if (read_ref(result->buf, &unused)) + break; + strbuf_remove(result, original_length, result->len - original_length); + ++suffix; + } + } + + strbuf_release(&default_name); +} + +struct resolve_metacommit_callback_data
While there are some structs with a _callback_data suffix in the code base, it is far more common to use _context and name any corresponding variables ctx.
+{ + struct change_table* active_changes; + struct string_list *changes; + struct oid_array *heads; +}; + +static int resolve_metacommit_callback(const char *refname, void *cb_data) +{ + struct resolve_metacommit_callback_data *data = (struct resolve_metacommit_callback_data *)cb_data;
We don't use redundant casts such as this.
+ struct change_head *chhead; + + chhead = get_change_head(data->active_changes, refname);
This is really a comment on the previous patch but are there uses of for_each_change_referencing() for which just the refname is sufficient? It might be more convenient to pass the change head into the callback as well.
+ + if (data->changes) + string_list_append(data->changes, refname)->util = &(chhead->head);
We don't use redundant parentheses such as this (and this patch does not use them consistently)
+ if (data->heads) + oid_array_append(data->heads, &(chhead->head)); + + return 0; +} + +/** + * Produces the final form of a metacommit based on the current change refs. + */ +static void resolve_metacommit( + struct repository* repo, + struct change_table* active_changes, + const struct metacommit_data *to_resolve,
[testing my understanding] This is the metacommit we want to update
+ struct metacommit_data *resolved_output,
This is the updated metacommit returned to the user
+ struct string_list *to_advance,
Is also an output? It ends up as a list of refname to change head mappings
+ int allow_append) +{ + int i; + int len = to_resolve->replace.nr; + struct resolve_metacommit_callback_data cbdata;
This would be a good place to a designated initializer. struct resolve_metacommit_context ctx = { .active_changes = active_changes, .changes = to_advance, .heads = &resolved_output->replace };
+ int old_change_list_length = to_advance->nr; + struct commit* content; + + oidcpy(&resolved_output->content, &to_resolve->content); + + /* First look for changes that point to any of the replacement edges in the + * metacommit. These will be the changes that get advanced by this + * metacommit. */
Style: '/*' & '*/' should be on their own lines.
+ resolved_output->abandoned = to_resolve->abandoned; + cbdata.active_changes = active_changes; + cbdata.changes = to_advance; + cbdata.heads = &(resolved_output->replace); + + if (allow_append) { + for (i = 0; i < len; i++) { + int old_number = resolved_output->replace.nr; + for_each_change_referencing(active_changes, &(to_resolve->replace.oid[i]), + resolve_metacommit_callback, &cbdata); + /* If no changes were found, use the unresolved value. */ + if (old_number == resolved_output->replace.nr) + oid_array_append(&(resolved_output->replace), &(to_resolve->replace.oid[i]));
We see if there are any refs under refs/metas/ which point to 'to_resolve' or its content and if there are we add those refs and the corresponding change head to 'to_advance'. If we don't find any refs then we copy the replace oid from 'to_resolve' to 'resolved_output'
If allow_append is false then we ignore all the replace oids in 'to_resolve'
+ } + } + + cbdata.changes = NULL; + cbdata.heads = &(resolved_output->origin); + + len = to_resolve->origin.nr; + for (i = 0; i < len; i++) { + int old_number = resolved_output->origin.nr; + for_each_change_referencing(active_changes, &(to_resolve->origin.oid[i]), + resolve_metacommit_callback, &cbdata); + if (old_number == resolved_output->origin.nr) + oid_array_append(&(resolved_output->origin), &(to_resolve->origin.oid[i])); + }
This is copying the origin oids in the same way as we copied the replace oids above.
+ /* If no changes were advanced by this metacommit, we'll need to create a new + * one. */ + if (to_advance->nr == old_change_list_length) { + struct strbuf change_name; + + strbuf_init(&change_name, 80); + content = lookup_commit_reference_gently(repo, &(to_resolve->content), 1); + + compute_change_name(content, &change_name); + string_list_append(to_advance, change_name.buf); + strbuf_release(&change_name); + } +} + +static void lookup_commits( + struct repository *repo, + struct oid_array *to_lookup, + struct commit_list **result) +{ + int i = to_lookup->nr; + + while (--i >= 0) { + struct object_id *next = &(to_lookup->oid[i]); + struct commit *commit = lookup_commit_reference_gently(repo, next, 1); + commit_list_insert(commit, result); + }
We walk backwards because commit_list_insert prepends to the list - good.
+} + +#define PARENT_TYPE_PREFIX "parent-type " + +/** + * Creates a new metacommit object with the given content. Writes the object + * id of the newly-created commit to result. + */ +int write_metacommit(struct repository *repo, struct metacommit_data *state, + struct object_id *result) +{ + struct commit_list *parents = NULL; + struct strbuf comment; + int i; + struct commit *content; + + strbuf_init(&comment, strlen(PARENT_TYPE_PREFIX) + + 1 + 2 * (state->origin.nr + state->replace.nr)); + lookup_commits(repo, &state->origin, &parents); + lookup_commits(repo, &state->replace, &parents); + content = lookup_commit_reference_gently(repo, &state->content, 1); + if (!content) { + strbuf_release(&comment); + free_commit_list(parents); + return -1; + } + commit_list_insert(content, &parents); + + strbuf_addstr(&comment, PARENT_TYPE_PREFIX); + strbuf_addstr(&comment, state->abandoned ? "a" : "c"); + for (i = 0; i < state->replace.nr; i++) + strbuf_addstr(&comment, " r"); + + for (i = 0; i < state->origin.nr; i++) + strbuf_addstr(&comment, " o"); > + /* The parents list will be freed by this call. */ + commit_tree(comment.buf, comment.len, repo->hash_algo->empty_tree, parents, + result, NULL, NULL);
It would be relatively easy to use commit_tree_extended() with extra_headers so that we create a commit with a "parent-type" header rather than abusing the commit message.
struct commit_extra_header extra = { .key = "parent-type" }; /* build header value in strbuf */ extra.value = buf.buf; extra.len = buf.len; commit_tree_extended("", 0, repo->hash_algo->empty_tree, parents, result, NULL, NULL, NULL, &extra);
+ + strbuf_release(&comment); + return 0; +} + +/** + * Returns true iff the given metacommit is abandoned, has one or more origin + * parents, or has one or more replacement parents. + */ +static int is_nontrivial_metacommit(struct metacommit_data *state) +{ + return state->replace.nr || state->origin.nr || state->abandoned; +} + +/* + * Records the relationships described by the given metacommit in the + * repository. + * + * If override_change is NULL (the default), an attempt will be made + * to append to existing changes wherever possible instead of creating new ones. + * If override_change is non-null, only the given change ref will be updated.
So override_head is the refname of an existing change?
+ * options is a bitwise combination of the UPDATE_OPTION_* flags. + */ +int record_metacommit( + struct repository *repo, + const struct metacommit_data *metacommit, const char *override_change, + int options, struct strbuf *err) +{ + struct change_table chtable; + struct string_list changes; + int result; + + change_table_init(&chtable); + change_table_add_all_visible(&chtable, repo); + string_list_init_dup(&changes); + + result = record_metacommit_withresult(repo, &chtable, metacommit, + override_change, options, err, &changes); + + string_list_clear(&changes, 0); + change_table_clear(&chtable); + return result; +} + +/* + * Records the relationships described by the given metacommit in the + * repository. + * + * If override_change is NULL (the default), an attempt will be made + * to append to existing changes wherever possible instead of creating new ones. + * If override_change is non-null, only the given change ref will be updated. + * + * The changes list is filled in with the list of change refs that were updated, + * with the util pointers pointing to the old object IDS for those changes. + * The object ID pointers all point to objects owned by the change_table and + * will go out of scope when the change_table is destroyed.
That potentially sounds like an invitation to create use after free bugs unless we're careful. Does this function need to be public?
+ * + * options is a bitwise combination of the UPDATE_OPTION_* flags. + */ +int record_metacommit_withresult( + struct repository *repo, + struct change_table *chtable, + const struct metacommit_data *metacommit, + const char *override_change, + int options, struct strbuf *err, + struct string_list *changes) +{ + static const char *msg = "updating change"; + struct metacommit_data resolved_metacommit; + struct object_id commit_target; + struct ref_transaction *transaction = NULL; + struct change_head *overridden_head; + const struct object_id *old_head; + + int i; + int ret = 0; + int force = (options & UPDATE_OPTION_FORCE); + + init_metacommit_data(&resolved_metacommit); + + resolve_metacommit(repo, chtable, metacommit, &resolved_metacommit, changes, + (options & UPDATE_OPTION_NOAPPEND) == 0); + + if (override_change) { + string_list_clear(changes, 0); + overridden_head = get_change_head(chtable, override_change); + if (!overridden_head) {
We enter this branch if overridden_head is NULL
+ /* This is an existing change */ + old_head = &overridden_head->head;
Here we de-reference overridden_head which is NULL
+ if (!force) { + if (!oid_array_readonly_contains(&(resolved_metacommit.replace), + &overridden_head->head)) { + /* Attempted non-fast-forward change */ + strbuf_addf(err, _("non-fast-forward update to '%s'"), + override_change); + ret = -1; + goto cleanup; + } + } + } else
Style: if one branch of an if statement requires braces then all branches should have braces.
+ /* ...then this is a newly-created change */ + old_head = null_oid(); + + /* The expected "current" head of the change is stored in the util + * pointer. */ + string_list_append(changes, override_change)->util = (void*)old_head;
No need to cast here
+ } + + if (is_nontrivial_metacommit(&resolved_metacommit)) { + /* If there are any origin or replacement parents, create a new metacommit + * object. */ + if (write_metacommit(repo, &resolved_metacommit, &commit_target) < 0) { + ret = -1; + goto cleanup; + } + } else + /** + * If the metacommit would only contain a content commit, point to the + * commit itself rather than creating a trivial metacommit. + */ + oidcpy(&commit_target, &(resolved_metacommit.content));
Oh, is this optimization why we don't insist on metacommits but also allow ordinary commits to be added to the change table?
diff --git a/metacommit.h b/metacommit.h new file mode 100644 index 00000000000..fdb253f0f04 --- /dev/null +++ b/metacommit.h @@ -0,0 +1,58 @@ +#ifndef METACOMMIT_H +#define METACOMMIT_H + +#include "hash.h" +#include "oid-array.h" +#include "repository.h" +#include "string-list.h" + + +struct change_table; + +/* If specified, non-fast-forward changes are permitted. */ +#define UPDATE_OPTION_FORCE 0x0001 +/** + * If specified, no attempt will be made to append to existing changes. + * Normally, if a metacommit points to a commit in its replace or origin + * list and an existing change points to that same commit as its content, the + * new metacommit will attempt to append to that same change. This may replace + * the commit parent with one or more metacommits from the head of the appended + * changes. This option disables this behavior, and will always create a new + * change rather than reusing existing changes. + */ +#define UPDATE_OPTION_NOAPPEND 0x0002 + +/* Metacommit Data */ + +struct metacommit_data { + struct object_id content; + struct oid_array replace; + struct oid_array origin; + int abandoned; +}; + +extern void init_metacommit_data(struct metacommit_data *state); + +extern void clear_metacommit_data(struct metacommit_data *state); + +extern int record_metacommit(struct repository *repo, + const struct metacommit_data *metacommit, + const char* override_change, int options, struct strbuf *err); + +extern int record_metacommit_withresult( + struct repository *repo, + struct change_table *chtable, + const struct metacommit_data *metacommit, + const char *override_change, + int options, + struct strbuf *err, + struct string_list *changes);
Does this need to be public? i.e. why would one call this rather than record_metacommit()?
+extern void modify_change(struct repository *repo, + const struct object_id *old_commit, const struct object_id *new_commit, + struct strbuf *err); + +extern int write_metacommit(struct repository *repo, struct metacommit_data *state, + struct object_id *result);
The documentation for the flags is very welcome but this header could to with the api being documented as well.
Best Wishes Phillip