Re: Monitoring a repository for changes

Eric Wong <e@xxxxxxxxx> · Wed, 21 Jun 2017 19:52:52 +0000

Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> wrote:
> On Wed, Jun 21 2017, Tim Hutt jotted:
> 
> > Hi,
> >
> > Currently if you want to monitor a repository for changes there are
> > three options:
> >
> > * Polling - run a script to check for updates every 60 seconds.
> > * Server side hooks
> > * Web hooks (on Github, Bitbucket etc.)
> >
> > Unfortunately for many (most?) cases server-side hooks and web hooks
> > are not suitable. They require you to both have admin access to the
> > repo and have a public server available to push updates to. That is a
> > huge faff when all I want to do is run some local code when a repo is
> > updated (e.g. play a sound).

Yeah, it kinda sucks that way.

Currently, for one of my public-inbox mirrors which has ssh
access to the primary server on public-inbox.org, I have:

	#!/bin/sh
	while true
	do
		# GNU tail(1) uses inotify to avoid polling on Linux
		ssh public-inbox.org tail -F /path/to/git-vger.git/info/refs | \
				while read sha1 ref
		do
			for GIT_DIR in git-vger.git
			do
				export GIT_DIR
				git fetch || continue
				git update-server-info
				public-inbox-index # update Xapian index
			done
		done
	done

It's not perfect as it requires multiple processes on the
server, but it's better than polling for my limited use.

> > Currently people resort to polling
> > (https://stackoverflow.com/a/5199111/265521) which is just ugly. I
> > would like to propose that there should be a forth option that uses a
> > persistent connection to monitor the repo. It would be used something
> > like this:
> >
> >     git watch https://github.com/git/git.git
> >
> > or
> >
> >     git watch git@xxxxxxxxxx:git/git.git
> >
> > It would then print simple messages to stdout. The complexity of what
> > it prints is up for debate, - it could be something as simple as
> > "PUSH\n", or it could include more information, e.g. JSON-encoded
> > information about the commits. I'd be happy with just "PUSH\n" though.
> 
> Insofar as this could be implemented in some standard way in Git it's
> likely to have a large overlap with the "protocol v2" that keeps coming
> up here on-list. You might want to search for past threads discussing
> that.

Yeah, it hasn't been a priority for me, either...

> > In terms of implementation, the HTTP transport could use Server-Sent
> > Events, and the SSH transport can pretty much do whatever so that
> > should be easy.
> 
> In case you didn't know, any of the non-trivially sized git hosting
> providers (e.g. github, gitlab) provide you access over ssh, but you
> can't just run any arbitrary command, it's a tiny set of whitelisted
> commands. See the "git-shell" manual page (github doesn't use that exact
> software, but something similar).
> 
> But overall, it would be nice to have some rationale for this approach
> other than that you think polling is ugly. There's a lot of advantages
> to polling for something you don't need near-instantly, e.g. imagine how
> many active connections a site like GitHub would need to handle if
> something like this became widely used, that's in a lot of ways harder
> to scale and load balance than just having clients that poll something
> that's trivially cached as static content.

Polling becomes more expensive with TLS and high-latency
connections, and also increases power consumption if done
frequently for redundancy purposes.

I've long wanted to do something better to allow others to keep
public-inbox mirrors up-to-date.  Having only 64-128 bytes of
overhead per userspace per-connection should be totally doable
based on my experience working on cmogstored; at which point
port exhaustion will become the limiting factor (or TLS overhead
for HTTPS).

But perhaps a cheaper option might be the traditional email/IRC
notification and having a client-side process watch for that
before fetching.