Re: cloning the kernel - why long time in "Resolving 313037 deltas"

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Linus Torvalds <torvalds@xxxxxxxx> wrote:
> On Tue, 19 Dec 2006, Theodore Tso wrote:
> > 
> > So the main reason to use mamp, as Linus puts it, is if the management
> > overhead of needing to read lots of small bits of the file makes the
> > use of malloc/read to be a pain in the *ss, then go for it.
> 
> An example of this in git is the regular pack-file accesses. We're MUCH 
> better off just mmap'ing the whole pack-file (or at least big chunks of 
> it) and not having to maintain difficult structures of "this is where I 
> read that part of the file into memory", or read _big_ chunks when 
> quite often we just use a few kB of it.
> 
> So mmap for pack-files does make sense, but probably only when you can 
> mmap big chunks, and are going to access much smaller (random) parts of 
> it.

Yes, exactly.

git-fast-import mmaps the pack file for this very reason.  It every
so often needs to go back and reread a tree object which has expired
from its own in-memory LRU cache.  This usually doesn't happen
very often, but when it does we don't know where we are going to
jump to get data from.  mmaping a huge segment of the pack file
(or the whole thing if its reasonably small) works for this case as
the OS buffer cache can just take care of it for us.  But as Linus
pointed out mmap and write() aren't safe on some systems.  Arrrgh.

However git-fast-import would probably work just as well (or maybe
slightly better) with pread().  I really should port that code
forward to current Git, use pread() instead, and submit the patch
to Junio.  But nobody really showed a lot of interest.


My sliding window pack-file access implementation (that I'm currently
rewriting on top of current Git) tries to work in very large chunks,
by default its 32 MiB per chunk, but its user/repository configurable
so kernel hackers may just set it to 256 MiB and continue to get
one large mmap for quite some time to come.  Of course I would
also like to get that to autoselect the window size rather than
just hardcode it.  :-)

The implementation would prefer a very small number (<8) of very
large chunks (>32 MiB), but is designed to more gracefully degrade
on huge packs on limited address space systems (e.g. Windows 32 bit)
then the current code does.

-- 
Shawn.
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]