Hi, Stefan Beller wrote: > This is a real take on the first part of the recent RFC[1]. > > Jonathan Tan suggested[2] that "sha1_loose_object_info to handle arbitrary repositories" > might be a good breaking point for a first part at that RFC at patch 38. > This series is smaller and contains only 26 patches as the patches in the big > RFC were slightly out of order. Thanks. This looks like a nice reviewable series, so I'm happy to see it broken out. [...] > Comments in the early range of that RFC were on 003 where Junio pointed out > that the coccinelle patch ought to be not in contrib/coccinelle, so I put it > in a sub directory there, as 'make coccicheck' doesn't traverse subdirs. Can you say a little more about this? Was the problem that the semantic patch wasn't idempotent, that it was too slow to run, or something else? If we're including the semantic patch for reference but never running it, I think I'd prefer it to go in the commit message. But if it's useful to run then we should make it idempotent so it can go in contrib/coccinelle. [...] > Duy suggested that we shall not use the repository blindly, but should carefully > examine whether to pass on an object store or the refstore or such[4], which > I agree with if it makes sense. This series unfortunately has an issue with that > as I would not want to pass down the `ignore_env` flag separately from the object > store, so I made all functions that only take the object store to have the raw > object store as the first parameter, and others using the full repository. I think I want to push back on this a little. The advantage of a function taking e.g. an object_store as an argument instead of a repository is that it increases its flexibility, since it allows callers that do not have access to a repository to call it. The disadvantage is also that it increases the flexibility without any callers benefitting from that: 1. It ties us to assumptions from today. If e.g. an object access in the future starts relying on some other information from the repository (e.g. its config) then we'd have to either add a back-pointer from the object store to its repository or add additional arguments for that additional data at that point. If all callers already have a repository, it is simpler to pass that repository as context so that we have the flexibility to make more use of it later. 2. It complicates the caller. Instead of consistently passing the same repository argument as context to functions that access that repository, the caller would have to pull out relevant fields like the object store from it. 3. It prevents us from making opportunistic use of other information from the repository, such as its name for use in error messages. In lower-level funcitons that need to be usable by callers without a repository (e.g. to find packfiles in an alternate) it makes sense to not pass a repository, but without such a use case in mind I don't think it needs to be a general goal. To put it another way, most callers do not *care* whether they are working with a repository's object store, ref database, or some other aspect of the repository. They just know they want to e.g. read an object from this repository. It's similar to how FILE * works: some operations rely on the buffer the FILE * manages and some other operations only rely on the underlying file descriptor, but using the FILE * consistently provides a clean abstraction that generally makes life easier. > Eric Sunshine brought up memory leaks with the RFC, and I would think to > have plugged all holes. Yay, thank you! I'll try to find time to look at the patches in detail soon, but no promises (i.e. if someone else reviews them first, then even better ;-)). Sincerely, Jonathan