Re: [PATCH 00/26] Moving global state into the repository object (part 1)

Jonathan Nieder <jrnieder@xxxxxxxxx> · Tue, 13 Feb 2018 11:26:21 -0800

Hi,

Stefan Beller wrote:

> This is a real take on the first part of the recent RFC[1].
>
> Jonathan Tan suggested[2] that "sha1_loose_object_info to handle arbitrary repositories"
> might be a good breaking point for a first part at that RFC at patch 38.
> This series is smaller and contains only 26 patches as the patches in the big
> RFC were slightly out of order.

Thanks.  This looks like a nice reviewable series, so I'm happy to see
it broken out.

[...]
> Comments in the early range of that RFC were on 003 where Junio pointed out
> that the coccinelle patch ought to be not in contrib/coccinelle, so I put it
> in a sub directory there, as 'make coccicheck' doesn't traverse subdirs.

Can you say a little more about this?  Was the problem that the
semantic patch wasn't idempotent, that it was too slow to run, or
something else?

If we're including the semantic patch for reference but never running
it, I think I'd prefer it to go in the commit message.  But if it's
useful to run then we should make it idempotent so it can go in
contrib/coccinelle.

[...]
> Duy suggested that we shall not use the repository blindly, but should carefully
> examine whether to pass on an object store or the refstore or such[4], which
> I agree with if it makes sense. This series unfortunately has an issue with that
> as I would not want to pass down the `ignore_env` flag separately from the object
> store, so I made all functions that only take the object store to have the raw
> object store as the first parameter, and others using the full repository.

I think I want to push back on this a little.

The advantage of a function taking e.g. an object_store as an argument
instead of a repository is that it increases its flexibility, since it
allows callers that do not have access to a repository to call it.  The
disadvantage is also that it increases the flexibility without any
callers benefitting from that:

 1. It ties us to assumptions from today.  If e.g. an object access in
    the future starts relying on some other information from the
    repository (e.g. its config) then we'd have to either add a
    back-pointer from the object store to its repository or add
    additional arguments for that additional data at that point.

    If all callers already have a repository, it is simpler to pass
    that repository as context so that we have the flexibility to make
    more use of it later.

 2. It complicates the caller.  Instead of consistently passing the
    same repository argument as context to functions that access that
    repository, the caller would have to pull out relevant fields like
    the object store from it.

 3. It prevents us from making opportunistic use of other information
    from the repository, such as its name for use in error messages.

In lower-level funcitons that need to be usable by callers without a
repository (e.g. to find packfiles in an alternate) it makes sense to
not pass a repository, but without such a use case in mind I don't
think it needs to be a general goal.

To put it another way, most callers do not *care* whether they are
working with a repository's object store, ref database, or some other
aspect of the repository.  They just know they want to e.g. read an
object from this repository.

It's similar to how FILE * works: some operations rely on the buffer
the FILE * manages and some other operations only rely on the
underlying file descriptor, but using the FILE * consistently provides
a clean abstraction that generally makes life easier.

> Eric Sunshine brought up memory leaks with the RFC, and I would think to
> have plugged all holes.

Yay, thank you!

I'll try to find time to look at the patches in detail soon, but no
promises (i.e. if someone else reviews them first, then even better
;-)).

Sincerely,
Jonathan