Re: [RFC PATCH 1/2] add-patch: compare object id instead of literal string

Patrick Steinhardt <ps@xxxxxx> · Mon, 29 Jan 2024 12:48:42 +0100

On Sun, Jan 28, 2024 at 11:41:22PM +0530, Ghanshyam Thakkar wrote:

We typically start commit messages with an explanation of what the
actual problem is that the commit is trying to solve. This helps to set
the stage for any reviewers so that they know why you're doing changes
in the first place.

> Add a new function reveq(), which takes repository struct and two revision
> strings as arguments and returns 0 if the revisions point to the same
> object. Passing a rev which does not point to an object is considered
> undefined behavior as the underlying function memcmp() will be called
> with NULL hash strings.
> 
> Subsequently, replace literal string comparison to HEAD in run_add_p()
> with reveq() to handle more ways of saying HEAD (such as '@' or '$branch'
> where $branch points to same commit as HEAD). This addresses the
> NEEDSWORK comment in run_add_p().
> 
> However, in ADD_P_RESET mode keep string comparison in logical OR with
> reveq() to handle unborn HEAD.
> 
> As for the behavior change, with this patch applied if the given
> revision points to the same object as HEAD, the patch mode will be set to
> patch_mode_(reset,checkout,worktree)_head instead of
> patch_mode_(...)_nothead. That is equivalent of not setting -R flag in
> diff-index, which would have been otherwise set before this patch.
> However, when given same set of inputs, the actual outcome is same as
> before this patch. Therefore, this does not affect any automated scripts.

So this is the closest to an actual description of what your goal is.
But it doesn't say why that is a good idea, it only explains the change
in behaviour.

I think the best thing to do would be to give a sequence of Git commands
that demonstrate the problem that you are trying to solve. This would
help the reader gain a high-level understanding of what you propose to
change.

> Also, add testcases to check the similarity of result between different
> ways of saying HEAD.
> 
> Signed-off-by: Ghanshyam Thakkar <shyamthakkar001@xxxxxxxxx>
> ---
> Should the return values of repo_get_oid() be checked in reveq()? As
> reveq() is not a global function and is only used in run_add_p(), the
> validity of revisions is already checked beforehand by builtin/checkout.c
> and builtin/reset.c before the call to run_add_p().
> 
>  add-patch.c               | 28 +++++++++++++++-------
>  t/t2016-checkout-patch.sh | 50 +++++++++++++++++++++++----------------
>  t/t2071-restore-patch.sh  | 21 ++++++++++------
>  t/t7105-reset-patch.sh    | 14 +++++++++++
>  4 files changed, 77 insertions(+), 36 deletions(-)
> 
> diff --git a/add-patch.c b/add-patch.c
> index 79eda168eb..01eb71d90e 100644
> --- a/add-patch.c
> +++ b/add-patch.c
> @@ -14,6 +14,7 @@
>  #include "color.h"
>  #include "compat/terminal.h"
>  #include "prompt.h"
> +#include "hash.h"
>  
>  enum prompt_mode_type {
>  	PROMPT_MODE_CHANGE = 0, PROMPT_DELETION, PROMPT_ADDITION, PROMPT_HUNK,
> @@ -316,6 +317,18 @@ static void setup_child_process(struct add_p_state *s,
>  		     INDEX_ENVIRONMENT "=%s", s->s.r->index_file);
>  }
>  
> +// Check if two revisions point to the same object. Passing a rev which does not
> +// point to an object is undefined behavior.

We only use `/* */`-style comments in the Git codebase.

> +static inline int reveq(struct repository *r, const char *rev1,
> +			const char *rev2)
> +{
> +	struct object_id oid_rev1, oid_rev2;
> +	repo_get_oid(r, rev1, &oid_rev1);
> +	repo_get_oid(r, rev2, &oid_rev2);
> +
> +	return !oideq(&oid_rev1, &oid_rev2);
> +}

I don't think it's a good idea to allow for undefined behaviour here.
While more tedious for the caller, I think it's preferable to handle the
case correctly where revisions don't resolve, e.g. by returning `-1` in
case either of the revisions does not resolve.

Patrick
Attachment:
signature.asc

Description: PGP signature