Re: performance on repack

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Jon Smirl <jonsmirl@xxxxxxxxx> wrote:
> On 8/12/07, Martin Koegler <mkoegler@xxxxxxxxxxxxxxxxx> wrote:
> > Have you considered the impact on memory usage, if there are large
> > blobs in the repository?
> 
> The process size maxed at 650MB. I'm in 64b mode so there is no
> virtual memory limit.
> 
> On 32b there's windowing code for accessing the packfile since we can
> run out of address space, does this code get turned off for 64b?

The windowing code you are talking about defaults as follows:

  Parameter                  32b      64b
  -----------------------------------------
  core.packedGitWindowSize    32M     1G
  core.packedGitLimit        256M     8G

So I doubt you are having issues with the windowing code on a 64b
system, unless your repository is just *huge*.  I did not think that
anyone had a Git repository that exceeded 8G, though the window
size of 1G might be a tad too small if there are many packfiles
and they are each larger than 1G.
 
> > * On the other hand, we could run all try_delta operations for one object
> >   parallel. This way, we would need not very much more memory, but
> >   require more synchronization (and more complex code).
> 
> This solution was my first thought too. Use the main thread to get
> everything needed for the object into RAM, then multi-thread the
> compute bound, in-memory delta search operation. Shared CPU caches
> might make this very fast.

I have been thinking about doing this, especially now that the
default window size is much larger.  I think the default is up as
high as 50, which means we'd keep that shiny new UltraSPARC T2 busy.
Not that I have one...  so anyone from Sun is welcome to send me
one if they want.  ;-)

I'm not sure its that complex to run all try_delta calls of the
current window in parallel.  Might be a simple enough change that
its actually worth the extra complexity, especially with these
multi-core systems being so readily available.  Repacking is the
most CPU intensive operation Git performs, and the one that is also
the easiest to make parallel.

Maybe someone else will beat me to it, but if not I might give such
a patch a shot in a few weeks.

-- 
Shawn.
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux