Hi Patrick, Sorry for the slow response. I was out of the office last week and am only just now getting a chance to catch up on the backlog of emails that I missed. On Thu, Jun 04, 2020 at 09:36:32AM +0200, Patrick Steinhardt wrote: > On Wed, Jun 03, 2020 at 10:51:42AM -0600, Taylor Blau wrote: > > Hi Patrick, > > > > On Wed, Jun 03, 2020 at 02:27:50PM +0200, Patrick Steinhardt wrote: > > > In order to test the impact on the case where we don't have any > > > "reference-transaction" hook installed in the repository, this commit > > > introduces a new performance test for git-update-refs(1). Run against an > > > empty repository, it produces the following results: > > > > > > Test HEAD~ HEAD > > > ------------------------------------------------------------------------------ > > > 1400.2: update existing reference 2.05(1.58+0.54) 2.08(1.58+0.57) +1.5% > > > 1400.3: create and destroy reference 1.79(1.38+0.49) 1.82(1.39+0.51) +1.7% > > > > > > So the overhead is around ~1.5%. Given that git-update-refs(1) is a > > > near-direct wrapper around reference transactions, there likely is no > > > other command that is impacted much worse than this. > > > > This is a serious performance regression that is worth considering. For > > example, a 1.5% slow-down on average in reference transactions would > > cause be enough for me to bisect the regression down to see what > > changed. > > > > What are ways that this could be avoided? I bet that this would work > > quite well with Emily's work on hooks, where we could check in the > > configuration first whether a hook is even configured. > > > > Could this be integrated with that? If not, could you cache the result > > of whether or not the hook exists, and/or implement some mechanism to > > say something like, for eg., "only run reference transaction hooks > > core.blah = 1" is true? > > I wasn't aware of her work until now, so I'll take a look. > > Meanwhile, I toyed around with performance and tried out two different > caching mechanisms: > > - map-cache: `find_hook()` uses a map of hook names mapped to their > resolved hook path (or `NULL` if none was found). This is a > generic mechanism working across all hooks, but has some overhead > in maintaining the map. > > - reftx-cache: `run_transaction_hook()` caches the path returned by > `find_hook()`. It's got less overhead as it only caches the path, > but obviously only applies to the reference-transaction hook. > > In theory, we could go even further and cache the hook's file > descriptor, executing it via fexecve(3P) whenever it's required, but I > didn't go down that route yet. This would also solve the atomicity > issue, though, if the administrator replaces the reference-transactions > hook halfway through the transaction. > > Benchmarking results are mixed, mostly in the sense that I can choose > which run of the benchmarks I take in order to have my own work look > better or worse, as timings vary quite a lot between runs. Which > probably hints at the fact that the benchmarks themselves aren't really > reliable. The issue is that a single git-update-ref(1) run finishes so > quick that it's hard to measure with our benchmarks, but spawning > thousands of them to get something different than 0.0s very much depends > on the operating system and thus fluctuates. On the other hand, if we > only spawn a single git-update-refs and have it perform a few thousand > ref updates, overhead of the hook will not show at all as it is only > executed once per prepare/commit of the transaction. > > The following timings are for the case where we execute > > git update-ref refs/heads/update-branch PRE POST && > git update-ref refs/heads/update-branch POST PRE > > respectively > > git update-ref refs/heads/new POST && > git update-ref -d refs/heads/new > > a thousand times: > > Test master ref-hook-no-cache ref-hook-map-cache ref-hook-reftx-cache > ------------------------------------------------------------------------------------------------------------------------------ > 1400.2: update existing reference 1.96(1.50+0.53) 2.00(1.54+0.53) +2.0% 2.02(1.54+0.55) +3.1% 1.98(1.52+0.52) +1.0% > 1400.3: create and destroy reference 1.74(1.33+0.49) 1.77(1.38+0.47) +1.7% 1.77(1.36+0.48) +1.7% 1.76(1.35+0.49) +1.1% Huh. It is super interesting (to me, at least) that caching the set of hooks that are on disk and should be run makes this *slower* than without the cache. What's going on there? In p1400.2, I'd expect to see 'ref-hook-map-cache' attain at most a 2.0% slow-down, plus a little bit to process the hook when it is there, but not much more than that. I think that this just seems a little contrived to me. I can understand why server administrators may want this feature, but the general user-base of Git doesn't seem to stand to benefit much from this change (in my own mind, at least). So... I'm not sure where this leaves us. Maybe a 2.0% slow-down on an already fast 'git update-ref' invocation for the average user won't be noticeable. It certainly *will* be noticeable to administrators who processes a much higher volume of such transactions. > For such a short-lived program like git-update-refs(1), one can see that > the overhead of using a map negatively impacts performance compared to > the no-cache case. But using the reftx-cache roughly cuts the overhead > in half as expected, as we only need to look up the hook once instead of > twice. > > And here's the results if we use a single `git update-ref --stdin` with a > thousand reference updates at once: > > Test master ref-hook-no-cache ref-hook-map-cache ref-hook-reftx-cache > ------------------------------------------------------------------------------------------------------------------------ > 1400.2: git update-ref --stdin 0.21(0.09+0.12) 0.21(0.07+0.14) +0.0% 0.21(0.07+0.13) +0.0% 0.21(0.07+0.13) +0.0% > > As expected, there's nothing much to be seen here because there's only a > single transaction for all ref updates and, as a result, at most two > calls to `access(refhook_path, X_OK)`. Better, but I have to imagine that real-world usage will look much more like a thousand tiny transactions than one large one. > Patrick Thanks, Taylor