Re: [PATCH 2/2] apply: do not use the_repository

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, May 28, 2024 at 09:33:07AM -0700, Junio C Hamano wrote:
> Patrick Steinhardt <ps@xxxxxx> writes:
> 
> > On Tue, May 28, 2024 at 06:32:01AM +0000, Philip Peterson via GitGitGadget wrote:
> >> diff --git a/apply.c b/apply.c
> >> index 901b67e6255..364c05fbd06 100644
> >> --- a/apply.c
> >> +++ b/apply.c
> >> @@ -3218,13 +3218,13 @@ static int apply_binary(struct apply_state *state,
> >>  		return 0; /* deletion patch */
> >>  	}
> >>  
> >> -	if (has_object(the_repository, &oid, 0)) {
> >> +	if (has_object(state->repo, &oid, 0)) {
> >>  		/* We already have the postimage */
> >>  		enum object_type type;
> >>  		unsigned long size;
> >>  		char *result;
> >>  
> >> -		result = repo_read_object_file(the_repository, &oid, &type,
> >> +		result = repo_read_object_file(state->repo, &oid, &type,
> >>  					       &size);
> >>  		if (!result)
> >>  			return error(_("the necessary postimage %s for "
> >
> > We call `get_oid_hex()` in this function, which ultimately ends up
> > accessing `the_repository` via `the_hash_algo`. We should likely convert
> > those to `repo_get_oid()`.
> >
> > There are also other accesses to `the_hash_algo` in this function, which
> > should be converted to use `state->repo->hash_algo`, as well.
> 
> We as a more experienced developers should come up with a way to
> help developers who are less familiar with the API set, so that they
> can chip in this effort to wean ourselves off of globals.
> 
> Would a bug like the ones you pointed out be easily caught by say
> running "GIT_TEST_DEFAULT_HASH=sha256 make test"?

No, I don't think so.

I was also thinking about different approaches to this problem overall.
Ideally, I would want to catch this on the programmatic level so that we
do not even have to detect this at runtime. And I think this should be
feasible by introducing something similar to the USE_THE_INDEX_VARIABLE
macro, which we have recently removed.

If we had a USE_THE_REPOSITORY_VARIABLE macro that guards declarations
of:

  - `the_repository`

  - `the_hash_algo`

  - Functions that implicitly rely on either of the above.

Then you could prove that a given code unit does not rely on any of the
above anymore by not declaring that macro.

In fact, these patches prompted me to give this a go this morning. And
while the changes are of course trivial by themselves, I quickly
discovered that they are also pointless because we almost always rely on
either of the above.

The most important reason of this is "hash.h", where `struct object_id`
falls back to `the_hash_algo` in case its own hash algorithm wasn't set.
Ideally, this would never be the case. But a quick test shows that there
are many places where we do rely on the fallback value, mostly around
null OIDs. Fixing this would be a necessary prerequisite.

Another important benefit of the macro would be that we stop working
against a moving target. New files shouldn't declare it, and once a file
has been refactored and the corresponding macro was removed we would
know that we don't add new dependencies on either of the above by
accident.

Patrick

Attachment: signature.asc
Description: PGP signature


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux