On Fri, Feb 02, 2018 at 08:58:52PM -0500, Derrick Stolee wrote: > I don't think pairing this with pack-objects or index-pack is a good > direction, because the commit graph is not locked into a packfile the way > the bitmap is. In fact, the entire ODB could be replaced independently and > the graph is still valid (the commits in the graph may no longer have their > paired commits in the ODB due to a GC; you should never navigate to those > commits without having a ref pointing to them, so this is not immediately a > problem). One advantage of tying this to packs is that you can piggy-back on the .idx to avoid storing object ids a second time. If we imagine that you use a 32-bit index into the .idx instead, that's a savings of 16 bytes per object (or more when we switch to a longer hash). You only need to refer to commits and their root trees, though. So on something like linux.git, you're talking about 2 * 700k * 16 = 21 megabytes you could save. That may not be worth worrying about too much, compared to the size of the rest of the data. Disk space is obviously cheap, but I'm more concerned about working-set size. However, 21 megabytes probably isn't breaking the bank there these days (and it may even be faster, since the commit-graph lookups can use the more compact index that contains only commits, not other objects). The big advantage of your scheme is that you can update the graph index without repacking. The traditional advice has been that you should always do a full repack during a gc (since it gives the most delta opportunities). So metadata like reachability bitmaps were happy to tied to packs, since you're repacking anyway during a gc. But my understanding is that this doesn't really fly with the Windows repository, where it's simply so big that you never obtain a single pack, and just pass around slices of history in pack format. So I think I'm OK with the direction here of keeping metadata caches separate from the pack storage. > This sort of interaction with GC is one reason why I did not include the > automatic updates in this patch. The integration with existing maintenance > tasks will be worth discussion in its own right. I'd rather demonstrate the > value of having a graph (even if it is currently maintained manually) and > then follow up with a focus to integrate with repack, gc, etc. > > I plan to clean up this patch on Monday given the feedback I received the > last two days (Thanks Jonathan and Szeder!). However, if the current builtin > design will block merging, then I'll wait until we can find one that works. If they're not tied to packs, then I think having a separate builtin like this is the best approach. It gives you a plumbing command to experiment with, and it can easily be called from git-gc. -Peff