Derrick Stolee <dstolee@xxxxxxxxxxxxx> writes: > The commit-graph file stores a condensed version of the commit history. > This helps speed up several operations involving commit walks. This > feature does not work well if those commits "change" using features like > commit grafts, replace objects, or shallow clones. I like to think about those features as providing an overlay for the commit graph (similar to overlay filesystems, like overlayfs); in the case of git-replace quite literally. I will be calling all those features "history-altering features", for short. > Since the commit-graph feature is optional, hidden behind the > 'core.commitGraph' config option, and requires manual maintenance (until > ds/commit-graph-fsck' is merged), these issues only arise for expert > users who decided to opt-in. > > This RFC is a first attempt at rectify the issues that come up when > these features interact. I have not succeeded entirely, as I will > discuss below. What I would like to see here in cover letter is a generic description of _strategy_ to deal with those history-altering features. >From what I understand you have the following options for each of replacements, shallow clones and grafts: - turn off serialized commit-graph if given history-altering feature is present, as if core.commitGraph was set to false - invalidate and optionally refresh commit-graph file if given history-altering feature is present (maybe only if it changed the view of the history, is such check is possible) For automatic invalidation you would need to have either: - cover all possible ways by which given history-altering feature can change the view of history, or - keep state of history-altering change for which commit-graph was created (e.g. in proposed VAL4 chunk) and compare with current view of history if it changed For automatic turning off you would need only to check if history-altering feature is present. Let us examine each of those history-altering features that Git supports: * shallow clone - shallow clone usually means having shorter history, so serialized commit-graph is not needed as much; also changing the depth screws-up assumptions about generation numbers - there are only a few entry points changing the view of history, namely fetch and push with shallow options (--depth=<depth>, --deepen=<depth>, --shallow-since=<date>, --update-shallow, --shallow-exclude=<revision>, --unshallow) - it is easy to check for presence of shallow clone feature by chacking of $GIT_DIR/shallow exists and is not empty - different contents of $GIT_DIR/shallow means different view of history - NOTE: internally uses grafts mechanism (emulated grafts) * replacements (replace objects, git-replace) - git-replace can be used to join current development repository with historical repository, in which case we would certainly want to make use of serialied commit graph; on the other hand the replacements do not necessary alter the view of the history - theoretically you could create replacement refs by hand, but in practice there are TWO ways of getting them: - using git-replace command to create, edit/change and delete replacement objects' ' - fetching or having pushed-to refs in refs/replace/* namespace - you need to check if there are any refs in refs/replace/* namespace to check if the feature is used (but it doesn't necessarily mean that it altered project history) - changed list of refs in refs/replace/* namespace (which you can get with for-each-ref command/API) does not necessarily mean that the view of the history changed: you can replace non-commit object, you can replace commits and not change their parents; it is not as easy as checking if file exists - NOTE: the feature can be turned off manually with GIT_NO_REPLACE_OBJECTS environment variable and with --no-replace-objects option to git wrapper. Also when pushing, fetching and fsck-ing this feature is turned off and refs in refs/replace/* namespace are treated as ordinary refs. This may mean that we would want to create commit-graph with replacements for ordinary non-bare repository, and without replacements for server-side bare repository. * grafts - subset of uses of git-replace, older and *obsolete* feature (because it is unsafe to use; that is you need to be careful with it). - edited by hand, no automatic entry points - if $GIT_DIR/info/grafts file is present, then feature is enabled (barring some corner cases, like empty file or file consisting only of comments) - changed contents of this file means changed view of history (well, except for reordering lines, or removing/adding empty lines and comments) > > The first two "DO NOT MERGE" commits are not intended to be part of the > main product, but are ways to help the full test suite run while > computing and consuming commit-graph files. This is acheived by calling > write_commit_graph_reachable() from `git fetch` and `git commit`. I > believe this is the only dependence on ds/commit-graph-fsck. The other > commits should only depend on ds/commit-graph-lockfile-fix. That's clever way of increasing coverage. > Running the full test suite after these DO NO MERGE commits results in > the following test failures, which I categorize by type of failure. > > The following tests are red herrings. Most work by deleting a commit > from the object database and trying to demonstrate a failure. Others > rely on how certain objects are loaded. These are not bugs, but will > add noise if you run the tests suite with this patch. > > t0410-partial-clone.sh Failed tests: 5, 8 > t5307-pack-missing-commit.sh Failed tests: 3-4 > t6011-rev-list-with-bad-commit.sh Failed test: 4 > t6024-recursive-merge.sh Failed test: 4 Does this means that those tests should be in the end (i.e. when core.commitGraph is turned on by defult) be simply run with core.commitGraph explicitly disabled for the test? > The following tests are fixed in "commit-graph: enable replace-object > and grafts". Would it make sense to split this commit into two dealing separately with replace objects and with grafts? Or do they use the same underlying API? > t6001-rev-list-graft.sh Failed tests: 3, 5, 7, 9, 11, 13 O.K. It might be good idea to add separate test that does the same, but with git-replace instead of grafts, though. > t6050-replace.sh Failed tests: 11-15, 23-24, 29 The t6050-replace.sh does not test changing the DAG of revision (excluding changing the SHA-1 of commit), if I read it correctly. It would be good to test using git-replace to change committer date, and to change parents: shortening or lengthening history (e.g. emulating joining two independent lines of development in the latter case). See also comment about t6001 above. > > The following tests involve shallow clones. > > t5500-fetch-pack.sh Failed tests: 30-31, 34, 348-349 > t5537-fetch-shallow.sh Failed tests: 4-7, 9 > t5802-connect-helper.sh Failed test: 3 > > These tests are mostly fixed by three commits: > > * commit-graph: avoid writing when repo is shallow > * fetch: destroy commit graph on shallow parameters Seems O.K. from the subject, but I'd have to check the details. > * commit-graph: revert to odb on missing parents I wonder if reverting to using object database is a good solution, and if it wouldn't be better to invalidate / delete commit-graph file instead... > > Each of these cases cover a different interaction that occurs with > shallow clones. Some are due to a commit becoming shallow, while others > are due to a commit becoming unshallow (and hence invalidating its > generation number). Why do not simply check if repository is shallow? > > After these changes, there is one test case that still fails, and I > cannot understand why: > > t5500-fetch-pack.sh Failed test: 348 > > This test fails when performing a `git fetch --shallow-exclude`. When I > halt the test using `t5500-fetch-pack.sh -d -i` and navigate to the > directory to replay the fetch it performs as expected. After struggling > with it for a while, I figured I should just put this series up for > discussion. Maybe someone with more experience in shallow clones can > point out the obvious issues I'm having. > > Thanks, > -Stolee > > Derrick Stolee (6): > DO NOT MERGE: compute commit-graph on every commit > DO NOT MERGE: write commit-graph on every fetch > commit-graph: enable replace-object and grafts > commit-graph: avoid writing when repo is shallow > fetch: destroy commit graph on shallow parameters > commit-graph: revert to odb on missing parents > > builtin/commit.c | 5 +++++ > builtin/fetch.c | 10 +++++++++ > builtin/gc.c | 2 +- > builtin/replace.c | 3 +++ > commit-graph.c | 65 +++++++++++++++++++++++++++++++++++++++++++++++++++---- > commit-graph.h | 9 ++++++++ > commit.c | 5 +++++ > environment.c | 2 +- > 8 files changed, 95 insertions(+), 6 deletions(-)