Theodore Tso <tytso@xxxxxxx> wrote: > On Tue, Dec 19, 2006 at 01:39:30AM -0500, Shawn Pearce wrote: > > This is why git-fast-import mmaps 128 MiB blocks from the file at > > a time. The mmap region is usually much larger than the file itself; > > the application appends to the file via write() then goes back > > and rereads data when necessary via the already established mmap. > > Its rare for the application to need to unmap/remap a different block > > so there really isn't very much page table manipulation overhead. > > Yes, but unless you are using the (non-portable, Linux specific) > MAP_POPULATE flag to mmap, each time you touch a new page, you end up > taking a page fault; and so malloc/read/free might *still* be faster. > I'd encourage you to make the change and benchmark it; the results may > be surprising. I played with this with dcraw, the Canon Raw File > converter a while back (before MAP_POPULATE was added), where I found > that with a linear access pattern, if you are reading the entire file, > it's stil marginally faster to use read() over mmap(), because with > dcraw taking a page fault every 4k of raw file, the system time was > significantly higher. Interesting. Lots of good reasons to probably just use pread() in there instead of mmap. For one thing git-fast-import doesn't go back and hit the already written pack data very often. Its own in memory caches usually perform very well. -- Shawn. - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html