On 09/20/2017 08:50 PM, Jeff King wrote: > On Tue, Sep 19, 2017 at 08:22:22AM +0200, Michael Haggerty wrote: >> If `packed-refs` was already sorted, then (if the system allows it) we >> can use the mmapped file contents directly. But if the system doesn't >> allow a file that is currently mmapped to be replaced using >> `rename()`, then it would be bad for us to keep the file mmapped for >> any longer than necessary. So, on such systems, always make a copy of >> the file contents, either as part of the sorting process, or >> afterwards. > > So this sort-of answers my question from the previous commit (why we > care about the distinction between NONE and TEMPORARY), since we now > start treating them differently. > > But I'm a bit confused why there's any advantage in the TEMPORARY case > to doing the mmap-and-copy versus just treating it like NONE and reading > it directly. > > Hmm, I guess it comes down to the double-allocation thing again? Let me > see if I have this right: > > 1. For NO_MMAP, we'd read the buffer once. If it's sorted, good. If > not, then we temporarily hold it in memory twice while we copy it > into the new sorted buffer. > > 2. For TEMPORARY, we mmap once. If it's sorted, then we make a single > copy. If it's not sorted, then we do the copy+sort as a single > step. > > 3. For MMAP_OK, if it's sorted, we're done. Otherwise, we do the > single copy. > > So this is really there to help the TEMPORARY case reading an old > unsorted file avoid needing to use double-the-ram during the copy? > > That seems like a good reason (and it does not seem to add too much > complexity). Yes, that's correct. A deeper question is whether it's worth this extra effort to optimize the conversion from "unsorted" to "known-sorted", which it seems should only happen once per repository. But * Many more Git commands read the `packed-refs` file than write it. So an "unsorted" file might have to be read multiple times before the first time it is rewritten with the "sorted" trait. * Users might have multiple Git clients writing their `packed-refs` file (e.g., the git CLI plus JGit in Eclipse), and they might not upgrade both clients at the same time to versions that write the "sorted" trait. So a single repository might go back and forth between "unsorted" and "known-sorted" multiple times in its life. * Theoretically, somebody might have a `packed-refs` file that is so big that it takes up more than half of memory. I think there are scenarios where such a file could be converted to "sorted" form in `MMAP_TEMPORARY` mode but would fail in `MMAP_NONE` mode. On the downside, in my benchmarking on Linux, there were hints that the `MMAP_TEMPORARY` branch might be a *tiny* bit slower than the `MMAP_OK` branch when operating on a known-sorted `packed-refs` file. But the speed difference was barely detectable (and might be illusory). And anyway, the speed difference on Linux is moot, since on Linux `MMAP_OK` mode will usually be used. It *would* be interesting to know if there is a significant speed difference on Windows. Dscho? Michael