On Fri, May 03, 2019 at 08:47:25AM -0400, Derrick Stolee wrote: > It would be much simpler to restrict the model. Your idea of changing > the file name is the inspiration here. > > * The "commit-graph" file is the base commit graph. It is always > closed under reachability (if a commit exists in this file, then > its parents are also in this file). We will also consider this to > be "commit-graph-0". > > * A commit-graph-<N> exists, then we check for the existence of > commit-graph-<N+1>. This file can contain commits whose parents > are in any smaller file. > > I think this resolves the issue of back-compat without updating > the file format: > > 1. Old clients will never look at commit-graph-N, so they will > never complain about an "incomplete" file. > > 2. If we always open a read handle as we move up the list, then > a "merge and collapse" write to commit-graph-N will not > interrupt an existing process reading that file. What if a process reading the commit-graph files runs short on file descriptors and has to close some of them, while a second process is merging commit-graph files? > I'll start hacking on this model. Have fun! :) Semi-related, but I'm curious: what are your plans for 'struct commit's 'graph_pos' field, and how will it work with multiple commit-graph files? In particular: currently we use this 'graph_pos' field as an index into the Commit Data chunk to find the metadata associated with a given commit object. But we could add any commit-specific metadata in a new chunk, being an array ordered by commit OID, and then use 'graph_pos' as an index into this chunk as well. I find this quite convenient. However, with mulitple commit-graph files there will be multiple arrays...