On Tue, Dec 19, 2006 at 01:39:30AM -0500, Shawn Pearce wrote: > This is why git-fast-import mmaps 128 MiB blocks from the file at > a time. The mmap region is usually much larger than the file itself; > the application appends to the file via write() then goes back > and rereads data when necessary via the already established mmap. > Its rare for the application to need to unmap/remap a different block > so there really isn't very much page table manipulation overhead. Yes, but unless you are using the (non-portable, Linux specific) MAP_POPULATE flag to mmap, each time you touch a new page, you end up taking a page fault; and so malloc/read/free might *still* be faster. I'd encourage you to make the change and benchmark it; the results may be surprising. I played with this with dcraw, the Canon Raw File converter a while back (before MAP_POPULATE was added), where I found that with a linear access pattern, if you are reading the entire file, it's stil marginally faster to use read() over mmap(), because with dcraw taking a page fault every 4k of raw file, the system time was significantly higher. So the main reason to use mamp, as Linus puts it, is if the management overhead of needing to read lots of small bits of the file makes the use of malloc/read to be a pain in the *ss, then go for it. But don't assume that you'll get better performance; in my experience, even on the hyper-performant Linus kernel, mmap() in general only barely breaks even with read(). On other systems, things are probably going to be even worse. - Ted - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html