On Wed, Oct 27 2021, Derrick Stolee wrote: > I'm starting this discussion thread to create a new area to consider these > high-level concepts. Specifically: how should a new component like Scalar > be included in the Git codebase? For context. I've submitted a PATCH form of the back & forth you/me/Johannes have been having on this topic, which hopefully should unambiguously clear up exactly what it is that I've been suggesting: https://lore.kernel.org/git/patch-1.1-86fb8d56307-20211028T185016Z-avarab@xxxxxxxxx/ As noted there some inline comments here, to the extent that it's not addressed by the patch in some way. > Options for preferred end state > ------------------------------- > > Let's get into some concrete proposals for the location of the Scalar CLI > within the Git codebase. These are ordered based on increasing > responsibility on the Git community: the first option should minimize > community responsibility and maximize responsibility on the Scalar > maintainers. We will discuss the pros and cons of each option after fully > describing each of them. > > (Please add any options that I may have missed!) I think what I'd like to add is...[continued below]... > **Option 1:** One-directional coupling in `contrib/scalar/` > > The Scalar CLI exists entirely within `contrib/scalar/`, including all code > and test infrastructure. To opt-in to building and testing Scalar, run > `make -C contrib/scalar test` to build Git and Scalar and run the Scalar > test scripts. Aside: I think anyone reading this summary might assume that this 1st item describes the current state of the scalar patches on-list, but that's not the case. I.e. that coupling is thoroughly bi-directional in its integration, and the code is built by default. > [...] > **Option 2:** Loose coupling in a new scalar/ directory > [...] > **Option 3:** Tight coupling with entire Git project > [...] ...[continued from above]...an "Option 0: Do we care?" I think I've been the only one who's been pointing out anything about the location of these source files, which prompted this mail. But it's not because I care about the location per-se. I don't think it matters, and I think arguing about that per-se would just amount to needless bikeshedding. What I *have* been pointing out is to the effect of "hey, your build/test integration is broken in <xyz> ways, it looks to me like the easiest way out of that happens to be to move these files to the top-level". E.g. the scalar series is now at v6, but apparently it hasn't been noticed that some of the tests were broken on OSX due to apparent errors with launchctl integration. That's because the patches had the code compiled, but didn't run the tests. If those and other issues with the series were fixed in a way that puts it at whatever in-tree path you & Johannes would prefer I wouldn't mind at all, and I doubt anyone else would have much of a strong preference either. I think the main reason there's been resistance from you two to consider moving it away from "contrib/*" (but correct me if I'm wrong, I'm still unclear on this point, even after reading this E-Mail in full) is that you think it's important to communicate a certain maintenance status for "scalar", so that we can freely change it, and even "git rm" it one day when it's served its purpose. I think the patch I've linked to above should address that in a way that's primarily aimed at communicating that to users, not just git.git developers. I.e. we now have it listed in "man git" under a "Optional contrib commands" section that says: The following commands are included with the git sources, but may not be present in your installation. These should be considered "contrib"-level when it comes to maintenance and stability promises. They might not even be included in your installation, and may either drastically change in the future, or go away entirely. So I guess my preferred option is: "Option 0: Make it integrate with git.git in a way that's not buggy". I believe the patch I've submitted gets us to that state. If you wanted to run with that & move it around in-tree I think that would be fine, as long as we're not introducing bugs by doing so. Some specific & briefer comments below: > [...] > **Option 1:** One-directional coupling in `contrib/scalar/` > > This option was our initial choice because it minimizes the responsibility > of the Git community. While `contrib/scalar/scalar.c` depends on code in > `libgit.a`, the implementation does not require that code to change to > accommodate the needs of Scalar. The test suite and documentation is > separate. > > This does mean that changes to `libgit.a` could break Scalar without any > feedback to the developer that has not compiled Scalar. The Scalar > maintainers would then need to watch for this and send separate updates to > Scalar that fix these dependency breaks. This reduces developer friction, > but might cause some extra burden on the Git maintainer. If these "catch > up to dependency breaks" updates happen only after a release candidate is > out, then maybe this isn't too much of a burden. We don't typically have > release candidates for minor releases, so there is some risk there that > minor releases could include breaks. > > If we make our CI builds include Scalar by default, then the previous > paragraph about developer friction is negated. Without going into exhaustive detail I'll just note as above that much of this describes a state that's got little or nothing to do with the contrib/scalar/* patches on-list, which are thoroughly bi-bidirectionally integrated, build (but don't test) by default etc. > Having Scalar in `contrib/` makes it easy to differentiate it as an > optional component that distributors could choose to include or leave out. Why would the FS path have anything to do with making that easy? E.g. distributors now can: make install NO_REGEX=MineIsBroken That we store the relevant sources in-tree at compat/regex/ doesn't matter to distributors. We could just as well have them at the top-level. The UI we present is just some Makefile knob. Yes, the currently propoed scalar patches would do the same via a sub-Makefile, and e.g. contrib/subtree works that way, but does it *need* to be like that for distributers? No, they'll just read the release notes, and copy whatever the relevant instructions are. > Code in `contrib/` has a lower barrier to entry. In particular, the Scalar > CLI is not intended to be up for debate for historical reasons. If we make > an exception for Scalar but want Scalar integrated outside of `contrib/`, > then are we setting expectations for other tools that might want to be > included? We're not. We can just document the status of individual logical components, separate from in-tree FS paths. > However, projects within `contrib/` do not currently depend on `libgit.a` > the way Scalar does. (This is not historically true, as an older project > ad such dependencies, but has since been ejected.).[...] Well, we've got contrib/buildsystems/, if a thing that builds libgit.a doesn't have an (inter-)dependency on libgit.a I don't know what would :) But yes, we've got no C code in contrib/ currently that links *to* libgit.a, which is one aspect of the breakages I've pointed out with the scalar patches. But as contrib/buildsystems/ shows the "contrib/" directory is not some "hands-off" directory for git.git developers, i.e. if you make certain changes in non-contrib the vs-build CI will break, since it has a hard dependency on that contrib component. So in terms of placing things in that directory unambiguously communicating some "third party" or "top-level doesn't depend on this" intent or state, I don't think it's really doing the work you seem to think it will. > This option explicitly mentions that all knowledge of how to build Scalar > lives within `contrib/scalar/` to avoid disrupting the core `Makefile`. > This has already led to debate about some duplication in the Scalar > `Makefile` and the one at root. As the instigator of particularl that debate: No that's not accurate at all. Well, to be fair I haven't been commenting on some hypothetical "contrib/scalar/" that's "one-directional" and that you "opt-in to building". As noted above I'm rather confused by this "contrib/scalar/" summary describing something that doesn't at all map unto the relevant posted patches. To the extent that I have been discussing this aspect of the actual posted patches it's been because the opposite of "all knowledge of how to build Scalar lives within `contrib/scalar`" is true, as a look at the removed lines in the patch I've posted above shows. Most of that build and test infrastructure has a thoroughly hard dependency on Makefiles at the top-level, in a way that causes more work for anyone who's changing any of that top-level build logic, not less. > **Option 2:** Loose coupling in a `scalar/` directory > > The first issue with this option is that the Scalar CLI is established and > is not up for modification, yet we would be contributing it in a location > that is typically under high scrutiny for things like this. I don't think anyone's been disputing the fundimental point that the whole reason for having this in-tree scalar is for the convenience of some existing users, and that any proposed changes to the CLI UI need to keep that in mind. How would just having it in some subdirectory change any of that? Sorry, I just continue not to understand what a "contrib/scalar/", "scalar/" or whatever prefix would have anything to do with that. > This option also breaks into new territory, because even `git-gui` and > `gitk-git` are not based on C and do not depend on `libgit.a`. If we're running with the idea that a one-level subdirectory of not-really-git.git sources is somehow magical then in-tree "sha1dc/" qualifies for the first part of that. I.e. we build it by default, it's C based, but it doesn't depend on libgit.a. > [...] > Developer friction increases as changes to `libgit.a` that break Scalar > should be fixed within the patches that cause those changes. So again on the "hypothetical contrib/scalar/*" point: This is also true of Johannes's "contrib/scalar/*" patches on-list. So having read this far this "scalar/*" distinction here seems to really be describing something like the current state (but under a different path), and patches for "Option 1" have not been seen on-list. > [...] > **Option 3:** Tight coupling with the entire Git project > > This option removes the ability to opt-out of building the Scalar CLI. > Distributors might need to react to remove the `scalar` executable if they > do not want to include it. I'm apathetic to whether we actually install it by default, but in my patch above that's optional. So "tight coupling" in the sense of not only building it, but also running the tests does not need to entail that we install it by default. > This option sets a risky precedent that new tools can be added to Git > without a rigorous review of the CLI. I think the changes I had to "man git" should cover this, not that the scalar CLI couldn't do with a bit more review/polishing (some of which I discovered recently when trying to get the CI working). > **Phase 1.** Keep code in contrib/ while we contribute the Scalar features. > > Since the current patches on the mailing list are not feature complete, it > can be beneficial to move forward with the code in `contrib/scalar/` until > we reach feature parity. At that point, we could move the source into its > final resting place. Why specifically does it need to be at a different in-tree path before it reaches feature parity? This proposal seems to implicitly assume a lot about the necessity for FS-path-based lifecycle management of components, without really justifying why any of that's needed. Let's just put things in their "final place", document their current state for human consumption somewhere (commit messages, "man git", etc.) and avoid the path-based churn? Why isn't that OK? > [...] > Depending on the final goal, we could drop some of the work that is currently > built within the `contrib/scalar/` directory, such as `-c`/`-C` parsing or > documentation builds. These features would be reimplemented in the new > location, so we can prevent that duplicate effort if we have a different > final location in mind. Just on -c & -C parsing: It's already true that scalar uses libgit.a, it's even true that scalar (admittedly small) parts of libgit.a that no other caller uses directly. So I don't see why having it in a given in-tree location and already needing parts of libgit.a would preclude us from creating APIs specifically for its use. Particularly in the case of git.c where those APIs would benefit any future "top-level program sort of like a git(1)" caller, and would/could even benefit our own sub-commands such as "git commit-graph" et al. E.g. if we lifted out the typo correction cmd dispatching from git.git and could use it for those + scalar.c, that would be very nice for existing in-tree users, and would also reduce code duplication in scalar.c. > [...] > Thanks for your attention to this topic. I look forward to any new "big" > ideas in this space. I think that in this case a series of "small" ideas might cause any preconceived "big" solutions to evaporate once implemented. E.g. in [1] Johannes noted: "Just forget about Scalar's build process. Forget about getting its CI to work. I have all that figured out already. It is all working as well as needed."[1] That "as well as needed" is doing a lot of work, as noted above it was clearly broken on the OSX CI at least. One thing my linked-to does is to fix/change that small aspect, does anyone mind the incremental step of scalar.c not only being built but also tested? If so let them just note their concerns in the releant thread. Once we address such point-by-point issues will any "big" ones remain? Maybe, maybe not. 1. https://lore.kernel.org/git/nycvar.QRO.7.76.6.2110221734530.62@xxxxxxxxxxxxxxxxx/