Theodore Tso <tytso@xxxxxxx> wrote: > On Mon, Dec 18, 2006 at 07:13:40PM -0500, Nicolas Pitre wrote: > > Maybe. However the mmap() may occur on section of the pack file which > > has just been written to in order to write even more, always to the same > > file. On Linux this is fast because the mmap'd data is likely to still > > be in the cache. > > > > I guess this could be turned into a malloc()/read()/free() with no > > trouble. > > Actually, depending on the size of the chunk, even on Linux > malloc/read/free can be faster than the mmap/munmap, because > mmap/munmap calls involve page table manipulations, and even on Linux > that is often slower or dead even with the memory copy involved with > using malloc/read. Even when reading huge chunks of Canon Raw File > data at a time, I found (experimentally) that it was no faster to use > mmap() compared to read(). And for small chunks of data, malloc/read > will definitely win out over mmap(), since the page table operations > and resulting page faults completely trump the cost of copying the > bytes from the page cache to the read() buffer. This is why git-fast-import mmaps 128 MiB blocks from the file at a time. The mmap region is usually much larger than the file itself; the application appends to the file via write() then goes back and rereads data when necessary via the already established mmap. Its rare for the application to need to unmap/remap a different block so there really isn't very much page table manipulation overhead. Why isn't git-index-pack doing the same? Is there some hidden glitch in some OS somewhere that has a problem with overmapping a file and appending into it via write()? I've done that on Mac OS X, Linux, BSDi, Solaris... never had a problem. -- Shawn. - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html