On Thu, Aug 05 2021, Emily Shaffer wrote: > On Thu, Aug 05, 2021 at 02:17:29AM +0200, Ævar Arnfjörð Bjarmason wrote: >> [...] >> > No, that's in fact as designed, with my model B. The user configured >> > "echo hi" to run on "verify-commit" events; if those events are >> > initially used by some wrapper, but later we decide they're a great idea >> > and absorb the verify-commit event into native Git, then this is working >> > as intended. I think your argument is based on a misunderstanding of the >> > design. >> > >> > Could it be that the way I provided the examples (my schema after A: and >> > your schema after B:) made it hard to parse? Sorry about that if so. >> >> Aren't you assuming that users who specify a verify-commit will be happy >> because git's usurping of that will 1=1 match what they were using >> "verify-commit" for. >> >> I'm pointing out that we can't know that, and since you want to make >> "git hook run" a general thing that runs any <name> of script you've >> configured, and not just what's in githooks(5) that it becomes very >> likely that if we add a new hook with some obvious name that we'll >> either break things for users, or subtly change behavior. >> >> Which isn't just theoretical, e.g. I tend to run something like a "git >> log --check @{u}.." before I run git-send-email, with this configurable >> hook mechanism having a "git hook run sendemail-check" would be a way I >> might expose that in my own ~/.gitconfig. >> >> But if git-send-email learns a "sendemail-check" and the behavior >> doesn't exactly match mine; E.g. maybe it similar to pre-auto-gc expects >> me to return a status code to ask me if I want to abort on a failed >> --check, but mine expects a revision range to run "log --check". >> >> In practice that's a non-issue with the current hook mechanism, >> i.e. nobody's sticking a script into .git/hooks/my-custom-name and >> expecting it to do anything useful (and if they are, they have only >> themselves to blame). >> >> Whereas we'd now be actively inviting users to squat on the same >> namespace we ourselves will need for future hooks. > > Yeah, this is a good point. Seems worth a note in the 'git hook run' > doc, making a point that "you can use this for your wrapper to run > specific hooks, but be careful about namespace collisions". We're a lot > less likely to add a hook named "repotool-verify-commit" than we are to > add a hook named "verify-commit", for example. > > I think it's enough to warn about future namespace collisions and make > an "at your own risk" note. I might have lost track at this point, but later examples in this E-Mail you show don't seem to require such a note. I.e. it's only an issue if we conflate a semantically meaningful slot like "pre-commit" in the config with one that can also have the meaning of simply defining an arbitrary user-decided name. There's no such collision if the config uses e.g. hook.mycheck.event=pre-commit & hook.mycheck.command=mycmd, as opposed to hook.pre-commit.command=mycmd. On the specifics of that example: I don't really care about the bikeshedding of the config key naming specifics, just the semantics of not putting user defined names and hook type names in the same slot if we can avoid it. >> > No, but it's something I'm interested in passing as an environment >> > variable. I didn't add it to this series because it seemed to me to >> > distract from the core feature. We would like to add it to simplify our >> > invocations of https://github.com/awslabs/git-secrets, so it's on my >> > radar. >> >> Having such an env var as part of the initial series seems like a >> sensible thing to have. > > Eh. To me, it feels like feature creep. It also is something we could > add today to the existing hook mechanism (even if it's a little > pointless since you can basename, like you say), so it feels orthogonal. > I would prefer not to add it in this series. Sure, I guess you can add two hook sections to replace e.g. your {pre,post}-receive hook (which are commonly routed to the same script with file-based hooks). Having a single setenv() seems easy enough, and I'd bet a way more common use-case than wanting to skip earlier defined hooks... >> > I am not sure what it means for a single executable to write "parallel = >> > true" - it is a single executable. >> > >> > Ok, that is me being facetious - I think you are saying we can AND >> > together all of the 'hook.<thing-with-event-we-care-about>.parallel' to >> > decide whether or not to run in parallel. >> >> Right, the case (whatever the config mechanism) wanting to use several >> off-the-shelf hooks and accomplish through git some version of this: >> >> parallel -j8 pre-receive-parallel-*.sh && >> parallel -j1 pre-receive-non-parallel-*.sh >> >> I.e. since we have N scripts for the "pre-receive" type, and we're >> expecting to say whether on not parallelism is OK or not, it seems like >> a natural thing we'll want to declare that differently for some of those >> than for others. >> >> > I would rather not discuss this now, for this series, because regardless >> > of which config schema we use today, we can figure out "parallel unless >> > we really don't want it" later on. It is too complex to discuss in the >> > context of "hey, we should also configure hooks somewhere else". Let's >> > leave it for future work. >> >> The point is that no, we really can't figure it out as easily later on >> regardless of the config schema. >> >> Because with 1=many you can't have 1=many.someAttribute=XYZ without that >> *.someAttribute=XYZ declaring something for all of 1=many, whereas if >> it's 1=1 then 1=1.someAttribute.XYZ obviously applies only to that 1=1. > > I think this is moot, since we are moving to "all config hooks have a > name", but my plan previously was to let this be set on a hookcmd. > Essentially, your suggestion is to make every hook a hookcmd. My point > was that it's easy to extend [object which represents an executable] in > the config to include "always run me in series" or "run me in series for > this specific event" regardless. That is, one could imagine, discarding > entirely the hookcmd junk and going with the schema I sketched in my > last email (which lands somewhere between yours and mine): Just to be clear, I don't have any concrete suggestion in mind right now (actually as I write this I can only vaguely recall what I suggested before). What I have been suggesting is not any specific implementation, but that we have a bias for the simple over the complex for an initial implementation. Complexity can always be added later, whereas coming up with a config schema that's irregular compared to other existing config in git is something we might regret sooner than later. > [hook "linter"] > command = ~/linter.sh > event = pre-commit > parallel = false > > or... > > [hook "linter"] > command = ~/linter.sh > event = pre-commit > event = commit-msg > > [hook "linter.commit-msg"] > parallel = false > > Or even... > > [hook "linter"] > command = ~/linter.sh > event = pre-commit > event = commit-msg > parallel = commit-msg > > The possibilities go on, as far as configuration goes. > > To me, the harder part of this problem is actually implementing the > execution. We had some discussions at length early on in the > config-based hook series about ways to do this kind of complex "some > stuff needs synchronous execution and some stuff doesn't, in the same > event" and decided that it mostly resolved to "you ain't gonna need it" > principle. So I would prefer to discuss this when we find out we do > actually need it. What I was mainly going for with "we really can't figure it out as easily later" above was not that this tweaking of "jobs" or parallelism was essential per-hook. But that it was a handy shorthand for a config attribute you might want to define for hooks, and having what are effectively groups of hooks, with N "command" or "event" in one section might make things more complex once you'd want to define optional attributes for one of those commands or events. >> [...] >> > - I do see value in having an explicit .skip field rather than mapping >> > .command to a noop, so "hook.name-of-hook.skip" as described above. >> > Of course the method you described will work regardless, since its >> > mechanism is based on the inherent result of executing /bin/true. >> >> I think we've mainly focused on the theoretical aspect of this, but FWIW >> I'm still entirely unclear on what this feature is even aimed for. >> >> All of the rest of our config does not have an explicit "skip" for >> anything, just last-set-wins. In terms of a real-world use-case wouldn't >> a user just edit or comment out the config earlier in ~/.gitconfig, and >> not "skip" it at the end with "git config [...] --add"? >> >> I suspect that the use-case is some Googly centrally managed >> /etc/gitconfig, but that's just speculation... > > Yep, this is exactly why. We've talked often on-list about how we ship > and configure Git for Googlers, but the upshot is "we pack up 'next' and > also ship an '/etc/gitconfig'". > > But I can also think of one really basic scenario when I'd want to skip > a hook in one repo without just commenting out my ~/.gitconfig: the > Gerrit Change-Id hook. > > Gerrit requires all commit messages to contain this Change-Id: abc123 > footer. It adds the footer by way of a commit-msg hook. That hook works > the same regardless of what your Gerrit remote is, so you can run the > same script on any project that uses Gerrit for code review. If, as I have in > the past, the vast majority of my projects use Gerrit, but I have one > project which does not, then I would love to configure the Gerrit > Change-Id hook globally and un-configure it for my one non-Gerrit > project. > > (At that time, I maintained a subsystem in a project based on Yocto, so > I needed to regularly contribute to 5-10 projects, all but one of which > used Gerrit. The one non-Gerrit one used a mailing list. I also had a > hobby project and my dotfiles, neither of which used Gerrit. This is not > an uncommon use case.) > I disagree fundamentally that "find and run a noop command like > /bin/true" is simpler to average users than "skip it by setting a > config". Like I said below, by including "skip" both approaches will > work. In reply to this, and moving things around a bit in the reply: >> All of that's something you'll need to explain in detail to users, which >> all seems way more complex than a simple: >> >> To skip a previously defined hook insert a noop-command, any will >> do, but setting it to "true" (usually /bin/true) is a handy >> convention for doing nothing. >> >> I.e. by keeping the config field as doing one thing only you avoid any >> such collisions etc. > > I disagree fundamentally that "find and run a noop command like > /bin/true" is simpler to average users than "skip it by setting a > config". Like I said below, by including "skip" both approaches will > work. To clarify, I haven't been advocating for that "skip = true" convention because I think it's a sensible thing per-se, but that I think this use-case is something that an individual configurable feature in git doesn't need stateful syntax to deal with. We have any number of multi-value and single-value config within git. I just don't see why on balance hooks need a special syntax to skip earlier set config for hooks specifically. E.g. this gerrit example would also be true of someone in a corporate setting using git-send-email, and wanting a list of sendemail.cc on all but their dotfiles project, or one other non-work project. Does that mean we need a sendemail.skipCC and special handling for it in git-send-email.perl? No, I think we'd generally advice users to just put those projects under ~/work or whatever, and then use config includes to set config for that group of projects based on the path: [includeIf "gitdir:~/work/"] path = ~/.gitconfig.d/work Or, if a hook is really so special that it's needed everywhere define it in /etc/gitconfig, and then just make the hook itself do: if test "$(git config --bool googleHooks.disableOurGlobalHook)" = "true" then exit 0 fi Which is pretty much (with the hook.* config prefix) how we've adviced users to do this since approximately forever with the sample hooks we ship. The advantage of using includes in that way is e.g. that you can easily see how your hook came to be configured with: git config --list --show-origin I.e. that (by convention) it comes via a conditionally included ~/.gitconfig.d/gerrit file. If it's a multi-value like sendemail.cc the semantics are also clear, e.g. you can get all values we'll use with "git config --get-all". Whereas choosing to implement this with something that *looks* like a config keyword, but really isn't is just confusing. We need to explain in one way how users might arrange for the likes of sendemail.cc to be defined for some, but not all of their repos, and explain it differently when it comes to hooks. There's inherent value in that explanation being the same for both.