Being rather a power user of the interactive rebase, I discovered recently that one of my scripts ran into an odd problem where an interactive rebase appended new commands to the todo list, and the interactive rebase failed to validate the (short) OIDs. But when I tried to fix things via git rebase --edit-todo, the interactive rebase had no longer any problem to validate them. Turns out that I generated the objects during the interactive rebase (yes, I implemented a 2-phase rebase [https://github.com/git-for-windows/build-extra/blob/master/ever-green.sh]), and that their hashes happened to share the first two hex digits with some loose object that had already been read in the same interactive rebase run, causing a now-stale loose object cache to be initialized for that .git/objects/?? subdirectory. It was quite the, let's say, adventure, to track that one down. Over at the companion PR for Git for Windows [https://github.com/git-for-windows/git/pull/2121], I discussed this with Peff (who introduced the loose object cache), and he pointed out that my original solution was a bit too specific for the interactive rebase. He suggested the current method of re-reading upon a missing object instead, and explained patiently that this does not affect the code path for which the loose object cache was introduced originally: to help with performance issues on NFS when Git tries to find the same missing objects over and over again. The regression test still reflects what I need this fix for, and I would rather keep it as-is (even if the fix is not about the interactive rebase per se), as it really tests for the functionality that I really need to continue to work. Since I (and other contributors touching the rebase code) frequently run only the t34* subset of the test suite while developing git rebase patches, I also want to keep the test case there, rather than putting it in with other SHA1-specific disambiguation test cases in t1512-rev-parse-disambiguation.sh. My only concern was that this might cause some performance regressions that neither Peff nor I could think of, where get_oid() may run repeatedly into missing objects by design, and where we should not blow away and recreate the loose object cache all the time. But Peff dispelled my fears, pointing out that get_oid() is already rather expensive and therefore avoided in hot loops. Changes since v1: * The added test case now properly quotes $GIT_REBASE_TODO (even if it does not contain spaces, it is simply safer to quote variable expansions in Bash always [https://mywiki.wooledge.org/Quotes]. * The three lines to try again to disambiguate are no longer repeated. Instead, we use a Beautiful Goto. * The commit message of 2/2 does not stress lean on the loose object cache so much, but also talks about packs a bit more. Johannes Schindelin (4): rebase -i: demonstrate obscure loose object cache bug sequencer: improve error message when an OID could not be parsed sequencer: move stale comment into correct location get_oid(): when an object was not found, try harder sequencer.c | 5 +++-- sha1-name.c | 13 ++++++++++++- t/t3429-rebase-edit-todo.sh | 22 ++++++++++++++++++++++ 3 files changed, 37 insertions(+), 3 deletions(-) base-commit: e902e9bcae2010bc42648c80ab6adc6c5a16a4a5 Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-161%2Fdscho%2Frereading-todo-list-and-loose-object-cache-v2 Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-161/dscho/rereading-todo-list-and-loose-object-cache-v2 Pull-Request: https://github.com/gitgitgadget/git/pull/161 Range-diff vs v1: 1: b3fcd37765 ! 1: 63ddb1dd04 rebase -i: demonstrate obscure loose object cache bug @@ -60,11 +60,11 @@ + "committer A U Thor <author@xxxxxxxxxxx> $1 +0000" \ + "" \ + "$1" | -+ git hash-object -t commit -w --stdin))" >>$GIT_REBASE_TODO ++ git hash-object -t commit -w --stdin))" >>"$GIT_REBASE_TODO" + + shift + test -z "$*" || -+ echo "exec $0 $*" >>$GIT_REBASE_TODO ++ echo "exec $0 $*" >>"$GIT_REBASE_TODO" + EOS + + git rebase HEAD -x "./append-todo.sh 5 6" 2: d8c4a3dde5 = 2: f0cbfacf9a sequencer: improve error message when an OID could not be parsed 3: d749c63170 = 3: b55e14cd63 sequencer: move stale comment into correct location 4: 994446236d ! 4: 41fc6eb9ba get_oid(): when an object was not found, try harder @@ -3,8 +3,9 @@ get_oid(): when an object was not found, try harder It is quite possible that the loose object cache gets stale when new - objects are written. In that case, get_oid() would potentially say that - it cannot find a given object, even if it should find it. + objects are written. Or that a new pack was installed. In those cases, + get_oid() would potentially say that it cannot find a given object, even + if it should find it. Let's blow away the loose object cache as well as the read packs and try again in that case. @@ -27,19 +28,31 @@ --- a/sha1-name.c +++ b/sha1-name.c @@ + struct object_id *oid, + unsigned flags) + { +- int status; ++ int status, attempts = 0; + struct disambiguate_state ds; + int quietly = !!(flags & GET_OID_QUIETLY); + +@@ + else + ds.fn = default_disambiguate_hint; + ++try_again: + find_short_object_filename(&ds); find_short_packed_object(&ds); status = finish_object_disambiguation(&ds, oid); + /* -+ * If we didn't find it, do the usual reprepare() slow-path, -+ * since the object may have recently been added to the repository -+ * or migrated from loose to packed. ++ * If we did not find it, do the usual reprepare() slow-path, since the ++ * object may have recently been added to the repository or migrated ++ * from loose to packed. + */ -+ if (status == MISSING_OBJECT) { ++ if (status == MISSING_OBJECT && !attempts++) { + reprepare_packed_git(the_repository); -+ find_short_object_filename(&ds); -+ find_short_packed_object(&ds); -+ status = finish_object_disambiguation(&ds, oid); ++ goto try_again; + } + if (!quietly && (status == SHORT_NAME_AMBIGUOUS)) { -- gitgitgadget