On 2025-02-13 at 02:05:47, Jayce Cao wrote: > My goal is to check the commits to be pushed in pre-push hook to see > if they contain sensitive data or not. > I have an assumption that those commits which already exist in remote > repos have no need to check. You will probably want to read https://git-scm.com/docs/gitfaq#restrict-with-hooks. It's very easy to bypass the `pre-push` hook locally by using `--no-verify` without any way to detect that, so assuming you want to have an effective control, you'll want a different approach. Note also that I don't believe libgit2 or other library-based Git engines invoke hooks at all, which is also going to lend itself to probably adopting a different approach. > So I read the Git doc and pre-push.sample file, I know that if we push > to a new branch that the remote does not have, > $remote_oid weil be zero, so we need to examine all commits in this > branch. We can run `git rev-list $local_oid` to > get all commits to be examined. > > But consider this case, if I'm developing a huge project which has > millions of commits. > I create a new branch (we call it feat/awesome-feat) based on the > master branch on my local repo, and create three commits. > Then I run the `git push --set-upstream origin feat/awesome-feat` > command to push the three commits to the remote. > But when the pre-push hook is called, `git rev-list $local_oid` will > print millions of commits. The commits except the new three > already exist in the remote repo. And the `git push` command will send > data only in the new commits to the remote, instead of all > history commits. > > So I mean we've no idea which commits will be sent to the remote > indeed in the pre-push hook when pushing to a new branch > that the remote doesn't have. I found a workaround: > * Run `git ls-remote -q -h` command to get the commits the remote has. > * Run `git rev-list $local_oid ^$haves` command to get the commits to > be pushed.($haves are the commits obtained from the previous step). > > But this workaround seems to be stupid when the remote has many > branches. I wonder if there is any better way to get the commits > to be pushed accurately in the pre-push hook. Git LFS has an optimization where it uses `git rev-list --not --remotes=origin` (or whatever the remote is). This excludes objects reachable from remote-tracking refs for the origin in question. However, this has some limitations. For instance, if the remote is specified as a URL and not a remote name, then there will never be any remote-tracking branches, and this optimization cannot be used. Notably, I believe EGit (and maybe JGit) _always_ specify the remote as a URL and never as a remote name, so this will not work there. You may wish to inspect that project's source code for more details. I am not aware of a better way to do this, but as I mentioned above, you may not want to do this at all. -- brian m. carlson (they/them or he/him) Toronto, Ontario, CA
Attachment:
signature.asc
Description: PGP signature