Re: cloning the kernel - why long time in "Resolving 313037 deltas"

Shawn Pearce <spearce@xxxxxxxxxxx> · Tue, 19 Dec 2006 20:58:07 -0500

Theodore Tso <tytso@xxxxxxx> wrote:
> On Tue, Dec 19, 2006 at 01:39:30AM -0500, Shawn Pearce wrote:
> > This is why git-fast-import mmaps 128 MiB blocks from the file at
> > a time.  The mmap region is usually much larger than the file itself;
> > the application appends to the file via write() then goes back
> > and rereads data when necessary via the already established mmap.
> > Its rare for the application to need to unmap/remap a different block
> > so there really isn't very much page table manipulation overhead.
> 
> Yes, but unless you are using the (non-portable, Linux specific)
> MAP_POPULATE flag to mmap, each time you touch a new page, you end up
> taking a page fault; and so malloc/read/free might *still* be faster.
> I'd encourage you to make the change and benchmark it; the results may
> be surprising.  I played with this with dcraw, the Canon Raw File
> converter a while back (before MAP_POPULATE was added), where I found
> that with a linear access pattern, if you are reading the entire file,
> it's stil marginally faster to use read() over mmap(), because with
> dcraw taking a page fault every 4k of raw file, the system time was
> significantly higher.

Interesting.  Lots of good reasons to probably just use pread()
in there instead of mmap.  For one thing git-fast-import doesn't
go back and hit the already written pack data very often. Its own
in memory caches usually perform very well.

-- 
Shawn.
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html