Previously, any time we wanted to read even a single reference from the `packed-refs` file, we parsed the whole file and stored it in an elaborate structure in memory called a `ref_cache`. Subsequent reference lookups or iterations over some or all of the references could be done by reading from the `ref_cache`. But for large `packed-refs` files, the time needed to parse the file, and the memory needed to cache its contents, can be quite significant. This is partly because building the cache costs lots of little memory allocations. (And lest you think that most Git commands can be executed by reading at most a couple of loose references, remember that almost any command that reads objects has to look for replace references (unless they are turned off in the config), and *that* necessarily entails reading packed references.) Following lots of work to extract the `packed_ref_store` into a separate module and decouple it from the `files_ref_store`, it is now possible to fundamentally change how the `packed-refs` file is read. * `mmap()` the whole file rather than `read()`ing it. * Instead of parsing the complete file at once into a `ref_cache`, parse the references out of the file contents on demand. * Use a binary search to find, very quickly, the reference or group of references that needs to be read. Parse *only* those references out of the file contents, without creating in-memory data structures at all. In rare cases this change might force parts of the `packed-refs` file to be read multiple times, but that cost is far outweighed by the fact that usually most of the `packed-refs` file doesn't have to be read *at all*. Note that the binary search optimization requires the `packed-refs` file to be sorted by reference name. We have always written them sorted, but just in case there are clients that don't, we assume the file is unsorted unless its header lists a `sorted` trait. From now on, we write the file with that trait. If the file is not sorted, it is sorted on the fly in memory. For a repository with only a couple thousand references and a warm disk cache, this change doesn't make a very significant difference. But for repositories with very large numbers of references, the difference start to be significant: A repository with 54k references (warm cache): git 2.13.1 this branch git for-each-ref 464 ms 452 ms git for-each-ref (no output) 66 ms 47 ms git for-each-ref (0.6% of refs) 47 ms 9 ms git for-each-ref (0.6%, no output) 41 ms 2 ms git rev-parse 32 ms 2 ms A repository (admittedly insane, but a real-life example) with 60M references (warm cache): git 2.13.1 this branch git for-each-ref (no output) 84000 ms 61000 ms git rev-parse 40000 ms 2 ms This branch applies on top of mh/packed-ref-transactions. It can also be obtained from my git fork [1] as branch `mmap-packed-refs`. Michael [1] https://github.com/mhagger/git Jeff King (1): prefix_ref_iterator: break when we leave the prefix Michael Haggerty (19): ref_iterator: keep track of whether the iterator output is ordered packed_ref_cache: add a backlink to the associated `packed_ref_store` die_unterminated_line(), die_invalid_line(): new functions read_packed_refs(): use mmap to read the `packed-refs` file read_packed_refs(): only check for a header at the top of the file read_packed_refs(): make parsing of the header line more robust read_packed_refs(): read references with minimal copying packed_ref_cache: remember the file-wide peeling state mmapped_ref_iterator: add iterator over a packed-refs file mmapped_ref_iterator_advance(): no peeled value for broken refs packed_ref_cache: keep the `packed-refs` file open and mmapped read_packed_refs(): ensure that references are ordered when read packed_ref_iterator_begin(): iterate using `mmapped_ref_iterator` packed_read_raw_ref(): read the reference from the mmapped buffer ref_store: implement `refs_peel_ref()` generically packed_ref_store: get rid of the `ref_cache` entirely ref_cache: remove support for storing peeled values mmapped_ref_iterator: inline into `packed_ref_iterator` packed-backend.c: rename a bunch of things and update comments refs.c | 22 +- refs/files-backend.c | 54 +-- refs/iterator.c | 47 ++- refs/packed-backend.c | 896 +++++++++++++++++++++++++++++++++++++------------- refs/ref-cache.c | 44 +-- refs/ref-cache.h | 35 +- refs/refs-internal.h | 26 +- 7 files changed, 761 insertions(+), 363 deletions(-) -- 2.14.1