Hi Johan, On Sun, 26 Jan 2020, Johan Herland wrote: > On Sun, Jan 26, 2020 at 12:16 PM Johannes Schindelin > <Johannes.Schindelin@xxxxxx> wrote: > > On Sat, 25 Jan 2020, brian m. carlson wrote: > > > This test relies on a roughly equal distribution of hashes for notes in > > > order to ensure that fanouts are compressed. If there are subtrees with > > > only one item left after removing notes, they'll end up still with one > > > level of fanout, causing the test to fail. > > > > That is _almost_ correct: The heuristic wants to see one bucket that has > > a note in it. Or something like that. > > > > See 73f77b909f8 (Notes API: for_each_note(): Traverse the entire notes > > tree with a callback, 2010-02-13) for details. (Cc:ing Johan.) > > Something like that, yeah... Re-reading this code, I believe we stop > the fanout at the current level when we can find one or more notes > that do not share the high-nibble of their path with another note. > > Here we're at the top level, so this corresponds to looking at the > very first hex character (0-9a-f) of the path (oid of annotated > object), and if there are at least two such objects for each hex > character, we will use a fanout of 1, otherwise, we collapse the > fanout to 0. That makes sense, but when I looked at the failed test, there seems something else at play, at least in addition to what you said: for _adding_ notes, your description is 100% accurate, but when deleting notes, we are apparently not collapsing `PTR_TYPE_SUBTREE` nodes "quickly" enough. Let me show you the part of `git ls-tree -r refs/notes/commits` that starts with the hex digits 7, 8 and 9: -- snip -- 100644 blob 22939636f79a53181ea58f04f2136bd745976edd0018465e60a8df89816a9c2b 72/67b1c94ab5ddf005fd4b0e50b6a7816a62e7ed0459cec6aa1a00577b2111ba 100644 blob 2e3f3dad7043b2d02d09ca12b299acbe05d781357d3a56a2d012fdf787409459 76/62d4b264094ca479be86ef7aa66daae63e45afa633a3892dc787b13ecff495 100644 blob 22e7f0c5af7315d692f8a107a43ba0784e1ab00a20ea803fab1acad1319e5f79 7b/8c0a9c86815a94da7ef90b356b1f98d6a4099af3fdc3d8625a8fa793b63821 100644 blob b3ce67c9d5507dc00d95d8fb2000c1d5b70908ad1d2c034e5833f57b7bb85511 7f/0af9cf9259cd6e67c0af3324ca443dd3d56694fbdc94d28e300a768d3d0e6e 100644 blob fafb40e32e87a2c481df6ceb37804d80c995faec3d7772c071b129fd47c2ba8a 8a/f1ad99a5e559f5835007f2bcb1b07de0e8c7434e7fbaa676a2edbd796a7f60 100644 blob b811e8da7c7acc83ef025753504ff6ed2d1eb8d2bb832d6b7487a7786d67aa53 93/f364fafaee6a178d8e8939f15d5b260f71940f663c3731396ee43082fd6551 100644 blob 0b1915ccb6f64cac64f2297893bce1ba7408660f79182302faaada71ee8c3c1c 96/b562b168f94e5c6e6a1e40ec4b817aa168c99769b1d9a46f7e048e93897fb4 -- snap -- This command was run in the worktree after running an unmodified t3305 with `GIT_TEST_DEFAULT_HASH=sha256` in brian's `transition-stage-4` branch (well, I removed the `SHA1` prereq again, that's the modification I made). As you can see, there _is_ a fanout of 1, but there is only a single note for a commit whose ID starts with `8`. Debugging a little, it seems that the `PTR_TYPE_SUBTREE` node for `8` was not collapsed, even if there was only one item left. So I formed a hypothesis that the subtrees are only removed when the _last_ item is removed for any given leading nybble, but that turned out to also be incorrect. It is a bit more tricky than that. This is the smallest set to which I was able to reduce the notes without reducing the fanout to 0: -- snip -- 100644 blob cc2ee5e00d8bc3805d704b67439dc8175bcf9497603288e6de2d4d8b3fc7be9f 05/b6e3d1394d020129f71fd8e41cf7ea8cbc58ae0f1332abd5da0c74ea194b71 100644 blob 1c6afe76a1bcd0103c36ab9707a2ca9e68974b6a6bbaae564c0509c43b4392bd 1e/311d64dc3ad5491964bcded60fee15b19d5b9c916a7e62a4f0746fa4e81fa6 100644 blob cf97d105e3970ef1cf9b12ac092be80abcd496c593bc8ca5550d059c3967630a 28/79e092d524b7ae9a42026ab2886094cce8ffc63175f8b3fd5de84faef10df3 100644 blob 78ef788b804dd0e5415c386b4a29668f61033483c35438f0471dfd7c4bfa093b 36/fe8fe67a2e9d0203c665d6e08ca454833ec32a97369769a7138d3938c0000b 100644 blob f46845c2d7e3272319aa5c18e359fdbc37731c88a945bb9632b8c4321983c75a 4a/a271a09d848f99d3fb978e5c156baa812a0fa1c30a88c885831641630be01a 100644 blob ecf5a4a178ca4b51cea457abe7c935761ca15a1d817f83c2da6816ede84db779 51/eaee3ca1a8698cb0aaa4de2d3f339985570da68b28e84af752e1cfd25f5197 100644 blob 2f4e9d6a4a1d050f8cb932ba545e53b48d9b669f6e00dc4a962d88ee9d92482b 62/dd63d43070c3ca7e3a6cdfa4ed970256a00a06e88d10fcb0532acd51419e0f 100644 blob 22939636f79a53181ea58f04f2136bd745976edd0018465e60a8df89816a9c2b 72/67b1c94ab5ddf005fd4b0e50b6a7816a62e7ed0459cec6aa1a00577b2111ba 100644 blob fafb40e32e87a2c481df6ceb37804d80c995faec3d7772c071b129fd47c2ba8a 8a/f1ad99a5e559f5835007f2bcb1b07de0e8c7434e7fbaa676a2edbd796a7f60 100644 blob b811e8da7c7acc83ef025753504ff6ed2d1eb8d2bb832d6b7487a7786d67aa53 93/f364fafaee6a178d8e8939f15d5b260f71940f663c3731396ee43082fd6551 100644 blob 589656c26944c471b5ac65739f8c7b96663a9f827a3d27beb49e39e3707b7294 a2/0e8f30856061125d479779755ef3238a7b561f9336e0143c437daac7d93f4c 100644 blob 7eaa9350d5c6bd6fd0fc4071c5d6a266949046c67987a9e1f665ba34d95f419d a2/80d13a05aa68cc5ef948a8b69067807457fd37c00ce4a234fa4a0c0753ef4e 100644 blob 7a8378dd60d6024db645757eac7271a80b101a2df230ebd1a57ff52ff2d32e36 b9/2a3a7dc6db290d93f79150bfb31447f3e550cfe4a63c5ddbeac18fec755e86 100644 blob 86fff6007d9911249bee803a6630e182677355a5574e637d1a8301e219c5da86 c1/52ffd73d1c5e7c121c7c247682f1ee971f6f09101c96a84486d18be41d0dd0 100644 blob cd887698e1da81a76ae1caf0eaec19d60830a13ead152ec4700be511ceb8ee33 d0/3f742b8b95f68478946d7fe7495da9462801fadeaaa06c11bf54dbc46610f5 100644 blob 19f7bdfee9687311dbe1195e0a64954b677ae68e6d734fd5fb76ea4ad4f93782 ea/356ae2d38123b46639db98df14953f4c7cdd91738779174ec67876ce9487e3 100644 blob 948c0cba23ec0405c622c9dea8ed8dd7b3fa043c86b5e5a8b4de0d1c6a0e67b9 f7/802d4c716fed3c76fe58c86ac7c3ae3e19b8c0d3ea97c9f90f5939fe5a78d8 100644 blob 28690ad489e29e3607c82e1f626ec24d7f831555c802108bfe9b993fbd794a7e f7/da03e811b7d9071ee19dabdc721e0f863e28c92dfa3257474282396a73bb44 -- snap -- You will note that there are two entries that start with `a2`, and two entries that start with `f7`. If I remove any of those, the corresponding subtree will be collapsed, and the fanout will be reduced to 0. But it is only happenstance with SHA-256 that there are these entries that agree not only in the first, but also in the second leading nybble. Therefore... > Hence we need an absolute minimum of 32 notes (and some rotten luck) > to get a fanout of 1. As the number of notes increase, the probably of > fanning out increases, passing 50% at ~79 notes, and reaching ~100% > somewhere north of 150 notes. ... I would register that we need an absolute minimum of 16 notes (and some rather crafty craft) to get a fanout of 1. In that light, I think that I would prefer to retract my patch that "only" reduces the remaining number of notes to 20: it should reduce them to 15 or less. So why not reduce it to 10 (because it is only one changed digit). > > > The test happens to pass with SHA-1, but doesn't necessarily with other > > > hash algorithms, so annotate it with the SHA1 prerequisite. > > > > I would rather see this tested, still, and reducing the number of notes > > that are retained from 50 to 20 before testing that the fanout has been > > reduced to 0 seems to do the trick. Therefore, I would love to submit this > > for squashing: > > Yes, it seems that for SHA1 and the (deterministic) objects used in > the test, we got away with 50 notes, but that is not the case for > other hash algorithms. Lowering the number to 20 definitely results a > fanout of 0, as should any other number below 32. > > +1 to Dscho's squash. > > ...Johan Thank you so much for the analysis. To be honest, I did not quite understand all the details of the comment added in 73f77b989f8 when I wrote the patch I suggested, so I basically just picked that number "20" out of thin air. Together with your insights, I would like to propose this commit message for the squashed commits (I left in the hunk that removes the `SHA1` prerequisite, but of course that won't be part of the final commit): -- snip -- t3305: make fanout test more robust (needed for SHA-256) To make things more performant, notes are stored in a "fanout": when there are enough commit notes, they are no longer stored as verbatim commit IDs at the top-level tree of the notes ref, but instead the tree is deepened much like the loose object cache: subtrees are introduced whose names are the two hex digits they "chomp off" the commit IDs. The test case 'deleting most notes triggers fanout consolidation' wants to verify that the fanout level is reduced automatically when enough notes have been deleted. However, that test case expected that reduction to level 0 (i.e. _no_ fanout subtrees) to happen after reducing the originally-added 300 notes to 50, which _happened_ to work with SHA-1-based commit IDs, but it is no longer works with SHA-256-based ones. The reason: The heuristic for the fanout looks at the number of entries for leading nybbles (read: hex digits) of the commit IDs. If there are more than a single annotated commit for all of the 16 hex digits, the fanout is incremented. It is a bit more tricky when reducing the number of notes: the fanout is reduced reliably only if there are less notes than hex digits (i.e. less than 15 notes) for a given prefix. For good measure, let's reduce the number of notes to 10 in the test case 'deleting most notes with git-notes' so that the test case 'deleting most notes triggers fanout consolidation' is guaranteed to succeed with _any_ hash algorithm. Original-patch-by: brian m. carlson <sandals@xxxxxxxxxxxxxxxxxxxx> Helped-by: Johan Herland <johan@xxxxxxxxxxx> Signed-off-by: Johannes Schindelin <johannes.schindelin@xxxxxx> --- diff --git a/t/t3305-notes-fanout.sh b/t/t3305-notes-fanout.sh index 3520402bb81..b83c2670ebe 100755 --- a/t/t3305-notes-fanout.sh +++ b/t/t3305-notes-fanout.sh @@ -43,7 +43,7 @@ test_expect_success 'many notes created with git-notes triggers fanout' ' ' test_expect_success 'deleting most notes with git-notes' ' - num_notes=250 && + num_notes=290 && i=0 && git rev-list HEAD | while test $i -lt $num_notes && read sha1 @@ -56,8 +56,8 @@ test_expect_success 'deleting most notes with git-notes' ' ' test_expect_success 'most notes deleted correctly with git-notes' ' - git log HEAD~250 | grep "^ " > output && - i=50 && + git log HEAD~290 | grep "^ " > output && + i=10 && while test $i -gt 0 do echo " commit #$i" && @@ -67,7 +67,7 @@ test_expect_success 'most notes deleted correctly with git-notes' ' test_cmp expect output ' -test_expect_success SHA1 'deleting most notes triggers fanout consolidation' ' +test_expect_success 'deleting most notes triggers fanout consolidation' ' # Expect entire notes tree to have a fanout == 0 git ls-tree -r --name-only refs/notes/commits | while read path -- snap -- brian, would you mind adopting this patch into your patch series? For your convenience, I pushed it up as `t3305-sha256-fanout` (based on your `transition-stage-4`) to https://github.com/dscho/git. Thanks, Dscho