(sorry for the late answer, I've been away from email over the holidays) On Tue, Dec 23, 2014 at 1:06 AM, Mike Hommey <mh@xxxxxxxxxxxx> wrote: > Hi, > > There are two major differences between adding notes with fast-import > and git notes, one of which is a serious problem: > > - fast-import doesn't want to add notes for non commits, while git notes > does. True. I consider this a bug in fast-import. A first draft of a patch series to fix that will be posted shortly. > - fast-import and git notes have different, conflicting fanouts: Yes. The relevant code in fast-import and git-notes is largely separate and independent of each other. The main reason for this, IIRC, is that fast-import uses its own data structures and algorithms when building trees, and that makes it hard to reuse the git-notes code (fast-import.c does not even #include notes.h because of this). fast-import is fundamentally built to solve a somewhat different use case than the rest of the git tools, and I believe this is why it "goes it alone" and implements its own (duplicate - to a certain degree) data structures and algorithms: It is heavily geared towards creating a _lot_ of trees in as little time and space as possible. The fast-import documentation states that the packs it generates are sub-optimal, and that you should always repack after fast-import to get more optimal/usable packs. Although not documented, a similar analogy applies to notes trees: fast-import will import a lot of notes quickly and efficiently, but the resulting fanout might not be the same (or as optimal) as what git-notes would produce. IMHO, it is not optimal (at least not currently) to use a workflow that interleaves notes tree manipulation from fast-import and git-notes, like you do below. You risk a lot of fanout churn, since the fanout calculation logic is different between the two. That said, there are clearly also bugs in the current fast-import behavior that we should fix... :) > - take e.g. the git repo (there needs to be a lot of commits to start > to see the problem) > - run the following to create notes for every commit: > (echo 'blob'; > echo 'mark :1'; > echo 'data 0'; > echo 'commit refs/notes/foo'; > echo 'committer <foo> 0 +0000'; > echo 'data 0'; > git rev-list --all | awk '{print "N :1", $1}'; > echo) | git fast-import > - pick a random commit. I'll pick > bbcefffcea9789e4a1a2023a1c778e2c07db77a7 as it is current master as > of writing. Take the first two characters of that sha1, and look at > the ls-tree: > git ls-tree refs/notes/foo bb/ > You'll see a number of blobs. Ok, so fast-import yields a 2/38 fanout for ~50 000 notes (#commits in git.git). fast-import keeps track of the total number of notes in the tree (although there are bugs, see the patch series), and divides that number repeatedly by 256 to find the fanout, so this result is expected, AFAICS > - Now, remove the note for that commit with git notes: > git notes --ref foo remove bbcefffcea9789e4a1a2023a1c778e2c07db77a7 > - ls-tree again, you'll now see a number of trees instead of blobs, > because git notes will have done a fanout. -> git notes does fanouts > for much less items than fast-import does. Maybe. git-notes has a different heuristic that does not keep track of the total number of notes in the tree. This is because git-notes only loads the required parts of the notes tree, so when you remove that single note, only the relevant parts of the notes tree (the bb/* tree) is loaded and manipulated. Then, git-notes looks at the density of note entries in that sub-tree (see determine_fanout() in notes.c for the details) to figure out if the fanout needs to be adjusted (but only in that sub-tree). In this case, it looks like the heuristic triggers a nested fanout within the bb/* tree. However, the other 255 "top-level" trees are not loaded, and hence not "re-balanced" by git-notes. > - Re-add a note for that commit with fast-import: > git fast-import <<EOF > blob > mark :1 > data 0 > commit refs/notes/foo > committer <foo> 0 +0000 > data 0 > from refs/notes/foo^0 > N :1 bbcefffcea9789e4a1a2023a1c778e2c07db77a7 > > EOF > - ls-tree again, and you'll see a number of trees and *one* blob, for > bb/cefffcea9789e4a1a2023a1c778e2c07db77a7 Hmm... Here I'd expect fast-import to rewrite the entire notes tree, not just a single entry. Not sure if this is a symptom of the bug discussed in our previous thread, or if this is a deeper problem. > - See the thread starting with 20141126004242.GA13915@xxxxxxxxxxxx, > this type of notes branch make things very slow. I have a suggested fix for this in my upcoming patch series. Please test this with your scenario to see if it makes a difference. > - Now, if you take an even bigger repository (as long as there are more > than 65536 commits, that's good ; I guess the linux kernel > qualifies, I've been checking with a mozilla-central clone), and > create exactly 65535 notes with git fast-import, you'll end up with > a 1-level tree (2/38). Add one more note, and the entire tree turns > into a 2-level tree (2/2/36). git notes would only add a level to > the tree containing the added note. git notes's behavior scales > better, because think about what happens on the next fanout with > fast-import... adding one note would need to create millions of trees. True, this is a good illustration of the difference between the notes code in fast-import and git-notes. It might be possible to change the fast-import code to work more like the git-notes code, but it's been quite a while since I looked closely at this, and I'm not sure it is as easy as it sounds. Have fun! :) ...Johan -- Johan Herland, <johan@xxxxxxxxxxx> www.herland.net -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html