Hi Michael, On Thu, 21 Sep 2017, Michael Haggerty wrote: > On 09/20/2017 08:50 PM, Jeff King wrote: > > On Tue, Sep 19, 2017 at 08:22:22AM +0200, Michael Haggerty wrote: > >> If `packed-refs` was already sorted, then (if the system allows it) > >> we can use the mmapped file contents directly. But if the system > >> doesn't allow a file that is currently mmapped to be replaced using > >> `rename()`, then it would be bad for us to keep the file mmapped for > >> any longer than necessary. So, on such systems, always make a copy of > >> the file contents, either as part of the sorting process, or > >> afterwards. > > > > So this sort-of answers my question from the previous commit (why we > > care about the distinction between NONE and TEMPORARY), since we now > > start treating them differently. > > > > But I'm a bit confused why there's any advantage in the TEMPORARY case > > to doing the mmap-and-copy versus just treating it like NONE and reading > > it directly. > > > > Hmm, I guess it comes down to the double-allocation thing again? Let me > > see if I have this right: > > > > 1. For NO_MMAP, we'd read the buffer once. If it's sorted, good. If > > not, then we temporarily hold it in memory twice while we copy it > > into the new sorted buffer. > > > > 2. For TEMPORARY, we mmap once. If it's sorted, then we make a single > > copy. If it's not sorted, then we do the copy+sort as a single > > step. > > > > 3. For MMAP_OK, if it's sorted, we're done. Otherwise, we do the > > single copy. > > > > So this is really there to help the TEMPORARY case reading an old > > unsorted file avoid needing to use double-the-ram during the copy? > > > > That seems like a good reason (and it does not seem to add too much > > complexity). > > Yes, that's correct. A deeper question is whether it's worth this extra > effort to optimize the conversion from "unsorted" to "known-sorted", > which it seems should only happen once per repository. But > > * Many more Git commands read the `packed-refs` file than write it. > So an "unsorted" file might have to be read multiple times before > the first time it is rewritten with the "sorted" trait. > > * Users might have multiple Git clients writing their `packed-refs` > file (e.g., the git CLI plus JGit in Eclipse), and they might not > upgrade both clients at the same time to versions that write the > "sorted" trait. So a single repository might go back and forth > between "unsorted" and "known-sorted" multiple times in its > life. > > * Theoretically, somebody might have a `packed-refs` file that is so > big that it takes up more than half of memory. I think there are > scenarios where such a file could be converted to "sorted" form > in `MMAP_TEMPORARY` mode but would fail in `MMAP_NONE` mode. > > On the downside, in my benchmarking on Linux, there were hints that the > `MMAP_TEMPORARY` branch might be a *tiny* bit slower than the `MMAP_OK` > branch when operating on a known-sorted `packed-refs` file. But the > speed difference was barely detectable (and might be illusory). And > anyway, the speed difference on Linux is moot, since on Linux `MMAP_OK` > mode will usually be used. It *would* be interesting to know if there is > a significant speed difference on Windows. Dscho? Sadly, I lack the time to test this thoroughly, but my experience is that those things are dominated by I/O on Windows, i.e. that the malloc() and copy won't matter all that much. In any case, the same argument you make for Linux holds, in reverse, for Windows: we cannot use MMAP_OK on Windows, so the point is moot ;-) Ciao, Dscho