Linus Torvalds <torvalds@xxxxxxxx> wrote: > On Tue, 19 Dec 2006, Theodore Tso wrote: > > > > So the main reason to use mamp, as Linus puts it, is if the management > > overhead of needing to read lots of small bits of the file makes the > > use of malloc/read to be a pain in the *ss, then go for it. > > An example of this in git is the regular pack-file accesses. We're MUCH > better off just mmap'ing the whole pack-file (or at least big chunks of > it) and not having to maintain difficult structures of "this is where I > read that part of the file into memory", or read _big_ chunks when > quite often we just use a few kB of it. > > So mmap for pack-files does make sense, but probably only when you can > mmap big chunks, and are going to access much smaller (random) parts of > it. Yes, exactly. git-fast-import mmaps the pack file for this very reason. It every so often needs to go back and reread a tree object which has expired from its own in-memory LRU cache. This usually doesn't happen very often, but when it does we don't know where we are going to jump to get data from. mmaping a huge segment of the pack file (or the whole thing if its reasonably small) works for this case as the OS buffer cache can just take care of it for us. But as Linus pointed out mmap and write() aren't safe on some systems. Arrrgh. However git-fast-import would probably work just as well (or maybe slightly better) with pread(). I really should port that code forward to current Git, use pread() instead, and submit the patch to Junio. But nobody really showed a lot of interest. My sliding window pack-file access implementation (that I'm currently rewriting on top of current Git) tries to work in very large chunks, by default its 32 MiB per chunk, but its user/repository configurable so kernel hackers may just set it to 256 MiB and continue to get one large mmap for quite some time to come. Of course I would also like to get that to autoselect the window size rather than just hardcode it. :-) The implementation would prefer a very small number (<8) of very large chunks (>32 MiB), but is designed to more gracefully degrade on huge packs on limited address space systems (e.g. Windows 32 bit) then the current code does. -- Shawn. - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html