Re: [PATCH v2 04/14] commit-graph: implement construct_commit_graph()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2/7/2018 10:08 AM, SZEDER Gábor wrote:
On Mon, Feb 5, 2018 at 5:06 PM, Derrick Stolee <stolee@xxxxxxxxx> wrote:
On 2/2/2018 10:32 AM, SZEDER Gábor wrote:
In my git repo, with 9 pack files at the moment, i.e. not that big a
repo and not that many pack files:

    $ time ./git commit-graph --write --update-head
    4df41a3d1cc408b7ad34bea87b51ec4ccbf4b803

    real    0m27.550s
    user    0m27.113s
    sys     0m0.376s

In comparison, performing a good old revision walk to gather all the
info that is written into the graph file:

    $ time git log --all --topo-order --format='%H %T %P %cd' |wc -l
    52954

    real    0m0.903s
    user    0m0.972s
    sys     0m0.058s

Two reasons this is in here:

(1) It's easier to get the write implemented this way and add the reachable
closure later (which I do).

(2) For GVFS, we want to add all commits that arrived in a "prefetch pack"
to the graph even if we do not have a ref that points to the commit yet. We
expect many commits to become reachable soon and having them in the graph
saves a lot of time in merge-base calculations.

So, (1) is for patch simplicity, and (2) is why I want it to be an option in
the final version. See the --stdin-packs argument later for a way to do this
incrementally.

I expect almost all users to use the reachable closure method with
--stdin-commits (and that's how I will integrate automatic updates with
'fetch', 'repack', and 'gc' in a later patch).
I see.  I was about to ask about the expected use-cases of the
'--stdin-packs' option, considering how much slower it is to enumerate
all objects in pack files, but run out of time after patch 10.

The run-time using '--stdin-commits' is indeed great:

   $ time { git for-each-ref --format='%(objectname)' refs/heads/ | ./git
     commit-graph --write --update-head --stdin-commits ; }
   82fe9a5cd715ff578f01f7f44e0611d7902d20c8

   real  0m0.985s
   user  0m0.916s
   sys   0m0.024s

Considering the run-time difference, I think in the end it would be a
better default for a plain 'git commit-graph --write' to traverse
history from all refs, and it should enumerate pack files only if
explicitly told so via '--stdin-packs'.

To be clear: I'm not saying that traversing history should already be
the default when introducing construct_commit_graph() and '--write'.  If
enumerating pack files keeps the earlier patches simpler and easier to
review, then by all means stick with it, and only change the
'--stdin-*'-less behavior near the end of the series, when all the
building blocks are already in place (but then mention this in the early
commit messages).

I will consider this.

I have also noticed a segfault when feeding non-commit object names to
'--stdin-commits', i.e. when I run the above command without restricting
'git for-each-ref' to branches and it listed object names of tags as
well.

   $ git rev-parse v2.16.1 |./git commit-graph --write --update-head
--stdin-commits
   error: Object eb5fcb24f69e13335cf6a6a1b1d4553fa2b0f202 not a commit
   error: Object eb5fcb24f69e13335cf6a6a1b1d4553fa2b0f202 not a commit
   error: Object eb5fcb24f69e13335cf6a6a1b1d4553fa2b0f202 not a commit
   Segmentation fault

(gdb) bt
#0  __memcpy_avx_unaligned ()
     at ../sysdeps/x86_64/multiarch/memcpy-avx-unaligned.S:126
#1  0x00000000004ea97c in sha1write (f=0x356bbf0, buf=0x4, count=20)
     at csum-file.c:104
#2  0x00000000004d98e1 in write_graph_chunk_data (f=0x356bbf0, hash_len=20,
     commits=0x3508de0, nr_commits=50615) at commit-graph.c:506
#3  0x00000000004da9ca in construct_commit_graph (
     pack_dir=0x8ff360 ".git/objects/pack", pack_indexes=0x0, nr_packs=0,
     commit_hex=0x8ff790, nr_commits=1) at commit-graph.c:818
#4  0x000000000044184e in graph_write () at builtin/commit-graph.c:149
#5  0x0000000000441a8c in cmd_commit_graph (argc=0, argv=0x7fffffffe310,
     prefix=0x0) at builtin/commit-graph.c:224
#6  0x0000000000405a0a in run_builtin (p=0x8ad950 <commands+528>, argc=4,
     argv=0x7fffffffe310) at git.c:346
#7  0x0000000000405ce4 in handle_builtin (argc=4, argv=0x7fffffffe310)
     at git.c:555
#8  0x0000000000405ec8 in run_argv (argcp=0x7fffffffe1cc, argv=0x7fffffffe1c0)
     at git.c:607
#9  0x0000000000406079 in cmd_main (argc=4, argv=0x7fffffffe310) at git.c:684
#10 0x00000000004a85c8 in main (argc=5, argv=0x7fffffffe308)
     at common-main.c:43

I noticed this while preparing v3. I have a fix, but you now remind me that I need to add tags to the test script.

Thanks,
-Stolee



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux