Re: fast-import's notemodify doesn't work the same as git notes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



(sorry for the late answer, I've been away from email over the holidays)

On Tue, Dec 23, 2014 at 1:06 AM, Mike Hommey <mh@xxxxxxxxxxxx> wrote:
> Hi,
>
> There are two major differences between adding notes with fast-import
> and git notes, one of which is a serious problem:
>
> - fast-import doesn't want to add notes for non commits, while git notes
>   does.

True. I consider this a bug in fast-import. A first draft of a patch
series to fix that will be posted shortly.

> - fast-import and git notes have different, conflicting fanouts:

Yes. The relevant code in fast-import and git-notes is largely
separate and independent of each other.

The main reason for this, IIRC, is that fast-import uses its own data
structures and algorithms when building trees, and that makes it hard
to reuse the git-notes code (fast-import.c does not even #include
notes.h because of this).

fast-import is fundamentally built to solve a somewhat different use
case than the rest of the git tools, and I believe this is why it
"goes it alone" and implements its own (duplicate - to a certain
degree) data structures and algorithms: It is heavily geared towards
creating a _lot_ of trees in as little time and space as possible.

The fast-import documentation states that the packs it generates are
sub-optimal, and that you should always repack after fast-import to
get more optimal/usable packs. Although not documented, a similar
analogy applies to notes trees: fast-import will import a lot of notes
quickly and efficiently, but the resulting fanout might not be the
same (or as optimal) as what git-notes would produce.

IMHO, it is not optimal (at least not currently) to use a workflow
that interleaves notes tree manipulation from fast-import and
git-notes, like you do below. You risk a lot of fanout churn, since
the fanout calculation logic is different between the two. That said,
there are clearly also bugs in the current fast-import behavior that
we should fix... :)

>   - take e.g. the git repo (there needs to be a lot of commits to start
>     to see the problem)
>   - run the following to create notes for every commit:
>       (echo 'blob';
>        echo 'mark :1';
>        echo 'data 0';
>        echo 'commit refs/notes/foo';
>        echo 'committer <foo> 0 +0000';
>        echo 'data 0';
>        git rev-list --all | awk '{print "N :1", $1}';
>        echo) | git fast-import
>   - pick a random commit. I'll pick
>     bbcefffcea9789e4a1a2023a1c778e2c07db77a7 as it is current master as
>     of writing. Take the first two characters of that sha1, and look at
>     the ls-tree:
>       git ls-tree refs/notes/foo bb/
>     You'll see a number of blobs.

Ok, so fast-import yields a 2/38 fanout for ~50 000 notes (#commits in
git.git). fast-import keeps track of the total number of notes in the
tree (although there are bugs, see the patch series), and divides that
number repeatedly by 256 to find the fanout, so this result is
expected, AFAICS

>   - Now, remove the note for that commit with git notes:
>       git notes --ref foo remove bbcefffcea9789e4a1a2023a1c778e2c07db77a7
>   - ls-tree again, you'll now see a number of trees instead of blobs,
>     because git notes will have done a fanout. -> git notes does fanouts
>     for much less items than fast-import does.

Maybe. git-notes has a different heuristic that does not keep track of
the total number of notes in the tree. This is because git-notes only
loads the required parts of the notes tree, so when you remove that
single note, only the relevant parts of the notes tree (the bb/* tree)
is loaded and manipulated. Then, git-notes looks at the density of
note entries in that sub-tree (see determine_fanout() in notes.c for
the details) to figure out if the fanout needs to be adjusted (but
only in that sub-tree). In this case, it looks like the heuristic
triggers a nested fanout within the bb/* tree. However, the other 255
"top-level" trees are not loaded, and hence not "re-balanced" by
git-notes.

>   - Re-add a note for that commit with fast-import:
>       git fast-import <<EOF
>       blob
>       mark :1
>       data 0
>       commit refs/notes/foo
>       committer <foo> 0 +0000
>       data 0
>       from refs/notes/foo^0
>       N :1 bbcefffcea9789e4a1a2023a1c778e2c07db77a7
>
>       EOF
>   - ls-tree again, and you'll see a number of trees and *one* blob, for
>     bb/cefffcea9789e4a1a2023a1c778e2c07db77a7

Hmm... Here I'd expect fast-import to rewrite the entire notes tree,
not just a single entry. Not sure if this is a symptom of the bug
discussed in our previous thread, or if this is a deeper problem.

>   - See the thread starting with 20141126004242.GA13915@xxxxxxxxxxxx,
>     this type of notes branch make things very slow.

I have a suggested fix for this in my upcoming patch series. Please
test this with your scenario to see if it makes a difference.

>   - Now, if you take an even bigger repository (as long as there are more
>     than 65536 commits, that's good ; I guess the linux kernel
>     qualifies, I've been checking with a mozilla-central clone), and
>     create exactly 65535 notes with git fast-import, you'll end up with
>     a 1-level tree (2/38). Add one more note, and the entire tree turns
>     into a 2-level tree (2/2/36). git notes would only add a level to
>     the tree containing the added note. git notes's behavior scales
>     better, because think about what happens on the next fanout with
>     fast-import... adding one note would need to create millions of trees.

True, this is a good illustration of the difference between the notes
code in fast-import and git-notes. It might be possible to change the
fast-import code to work more like the git-notes code, but it's been
quite a while since I looked closely at this, and I'm not sure it is
as easy as it sounds.


Have fun! :)

...Johan

-- 
Johan Herland, <johan@xxxxxxxxxxx>
www.herland.net
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux