"git reflog expire --all" very slow

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I haven't checked in detail what is up, but I just did a "git gc --prune", 
and it was quiet for about half a minute before anything seemed to happen.

Very irritating, as normally the expensive stuff at least gives you some 
kind of indication of what it's doing.

It turns out that it's the reflog expiration. On my crazy beefy 
Nehalem machine:

	[torvalds@nehalem linux]$ time git reflog expire --all

	real	0m37.596s
	user	0m37.554s
	sys	0m0.040s

and that really isn't good. 37 cpu-seconds on this machine is like half a 
decade on some laptops I could name.

The flat pgprof for this thing (user-land oprofile isn't doing Nehalem 
yet) looks like this:

      %   cumulative   self              self     total           
     time   seconds   seconds    calls   s/call   s/call  name    
     60.94     30.24    30.24 301120211     0.00     0.00  interesting
     12.37     36.38     6.14 301338513     0.00     0.00  insert_by_date
     11.35     42.01     5.63     8776     0.00     0.00  clear_commit_marks
      9.96     46.95     4.94     4388     0.00     0.01  merge_bases_many
      2.16     48.02     1.07 301486366     0.00     0.00  commit_list_insert
      1.21     48.62     0.60 301329737     0.00     0.00  parse_commit
      0.87     49.05     0.43 301637945     0.00     0.00  xmalloc
      0.34     49.22     0.17       24     0.01     0.01  xstrdup
      ...

Ok, so my reflog on this thing has 1583 entries on HEAD (yes, in the last 
90 days, the problem is _not_ that I have a long reflog and am pruning it, 
it _is_ already pruned). Add to that the reflogs for the branches (mainly 
master: 1294), and you end up with apparently a nice total of 4388 reflog 
entries.

And then it looks like for _each_ reflog entry we have:

  expire_reflog_ent()
    in_merge_bases()

which then calls 

  get_merge_bases()
    get_merge_bases_many()
      ..

each of which probably often traverses an appreciable part of the kernel 
tree, since my reflog entries are often merges, and the merge bases need 
easily thousands of commits to look up.

Which explains how you end up with 301 _million_ commits inserted into the 
lists and checked if they are interesting. Since the whole kernel tree has 
only something like 140k commits, and my revlog doesn't even go back more 
than three months, I guess that means that we'll be traversing the same 
commits tens of thousands of times each.

Even on this machine, that whole cluster-f*ck takes a little while. Oops.

I have not checked if there is anything really obvious going on that could 
change that whole logic that causes us to do merge-bases into something 
saner, since the reflog code is not a part of git I'm familiar with. 

Instead, I'm just sending this to Junio, Brandon, and Dscho, who are 
getting the main blame for 'builtin-reflog.c'. Although I'm pretty sure 
this is all Junio, but just in case..

			Linus
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux