Re: [PATCH 00/21] Introduce `USE_THE_REPOSITORY_VARIABLE` macro

Patrick Steinhardt <ps@xxxxxx> · Wed, 12 Jun 2024 09:37:37 +0200

On Tue, Jun 11, 2024 at 11:24:30PM +0000, brian m. carlson wrote:
> On 2024-06-11 at 11:57:33, Patrick Steinhardt wrote:
> > Hi,
> > 
> > use of the `the_repository` variable is nowadays considered to be
> > deprecated, and over time we want to convert our codebase to stop using
> > it in favor of explicitly passing down the repository to functions via
> > parameters. This effort faces some important problems though.
> > 
> >   - It is hard to prove that a certain code unit does not use
> >     `the_repository` anymore when sending patch series. The reviewer has
> >     no way to verify that it's not used anymore without reading through
> >     the code itself.
> > 
> >   - It is easy to sneak in new usages of `the_repository` by accident
> >     into a code unit that is already `the_repository`-clean.
> > 
> >   - There are many functions which implicitly use `the_repository`,
> >     which is really hard to spot.
> > 
> > This patch series aims to address those problems by introducing a new
> > `USE_THE_REPOSITORY_VARIABLE` macro. When unset, then the declarations
> > of `the_repository`, `the_hash_algo` and some functions that implicitly
> > depend on them will be hidden away. This makes it trivial to demonstrate
> > that a code unit is `the_repository`-free by removing the definition of
> > any such macro.
> 
> Overall, I left a few comments, but I think this definitely moves us in
> the right direction and I'm glad to see it.  This obviously improves the
> experience with libification and unit testing in a lot of ways, which is
> good.
> 
> My only caution is that using the *_any functions will cause us a world
> of pain if we ever adopt another 256-bit hash function, since it will be
> ambiguous which algorithm is to be used.  That's why, traditionally, we
> haven't assumed a hash algorithm based on the object ID length.  I don't
> think the amount of uses we have is excessive, even with your changes,
> but we'll need to be mindful of that going forward.

The only cases where I add new calls to `_any()` are in test helpers:

  - "t/helper/test-oidtree.c". This one is getting converted to a unit
    test by Ghanshyam, so I'll leave it to him to improve this.

  - "t/helper/test-proc-receive.c". Here we don't care about the actual
    algorithm, the only thing we care about is that we can correctly
    parse them and then eventually emit them via `oid_to_hex()` again.
    So even if we introduce a second hash function with the same length
    this code would continue to work alright.

So I think it should be fine in the context of this series. But the
remark is certainly valid and something we should be cautious about
going forward.

Thanks!

Patrick
Attachment:
signature.asc

Description: PGP signature