Re: Inexplicably deteriorating performance of Git repositories on Windows

A Large Angry SCM <gitzilla@xxxxxxxxx> · Wed, 24 Nov 2010 16:18:31 -0500

On 11/24/2010 04:00 PM, Dun Peal wrote:
On Wed, Nov 24, 2010 at 5:16 PM, Joshua Jensen
<jjensen@xxxxxxxxxxxxxxxxx>  wrote:
Whenever I want to know exactly what is going on with disk access, I
download Process Monitor from http://sysinternals.com/.

In order to just show disk access, I filter entries that begin with TCP,
UDP, and Reg out.

Josh

Thanks, we tried that and we don't see a whole lot of disk activity on
the "fast" machines.

One emerging theory is that the "slow" Windows machines differ from
the "fast" ones by how their disk cache works.

So `git status` on a large tree heavily depends on caching. Without
it, it would be slow; with it, it's much faster.

We verified that part since when we reboot a fast Windows machine, the
first run of `git status` is slow (~30s) but the next one is much
faster (~5s).

We see a similar phenomenon on Linux: the first run is always
significantly slower than the others.

On slow Windows machines, this difference is much less pronounced.

On a typical "slow" machine, if you clone the repo, the first run of
`git status` on it would already be fast (5s). But then your reboot,
and the first run is slow, but then it only gets up to 14s. And you
can't get back the 5s latency unless you re-clone the repo and status
the fresh clone.

So my theory is that there's a cache that on the "fast" machines
aggressively caches the entire tree on a regular `git status` run. On
such a machine, it's enough to run `git status` once, and after that
initial cold run, the rest will be warm... until you reboot the
machine, rinse, repeat.

On a slow machine, however, cache isn't so aggressive. It might be
write-oriented. So when you write out a whole new working tree, that
tree gets cached as it is written. And for the remainder of the
lifetime of that cache, you get the fully-cached performance you see
on the "fast" machines. But then you reboot the machine, and lose the
cache. And since the caching process isn't aggressive, any number of
`git status` runs won't get you back to the fully cached state. You
will only get that on a newly written working copy.

What do you think?

How much memory do the fast and slow machines have? How much memory will 
windows use for disk caching? Is it possible that your normal work flow 
between status' are forcing the caches to pruned due to memory pressure?
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html