Re: [PATCH] Implement limited context matching in git-apply.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Linus Torvalds <torvalds@xxxxxxxx> writes:

> On Mon, 10 Apr 2006, Eric W. Biederman wrote:
>> 
>> If I just loop through all of Andrews patches in order
>> and run git-apply --index -C1 I process the entire patchset
>> in 1m53s or about 6 patches per second.  So running
>> git-mailinfo, git-write-tree, git-commit-tree, and
>> git-update-ref everytime has a measurable impact,
>> and shows things can be speeded up even more.
>
> git-write-tree is actually a fairly expensive operation on the kernel. It 
> needs to write the 1000+ tree objects - and while _most_ of them already 
> exist (and thus don't actually need to be written out), we need to 
> generate the tree object and its SHA1 in order to notice that that is the 
> case.
>
> I'm almost certain that 90%+ of the overhead you see is the tree writing, 
> not the rest of the scripting.

Well it is easy enough to time.  Looking at the timings
going from just git-apply to git-apply && git-write-tree
does seem to about the double the amount of time taken,
or take me to about 4 minutes.  With everything else
in there things happen in the 6-7 minute range with
in the hot cache scenario.  So write-tree is closer
to 50% of the overhead.

Is it possible to cache the sha1 of unmodified directories?

If we did that we could probe to see if the hash already
existed before we attempted to look for the subdirectories.

The pain would is remembering which directory sha1 are
current.  If nothing else we can modify: 
remove_cache_entry, and add_file_to_cache to clear
the parent directories cached sha1 when we update an
index entry.  But I keep thinking there should
be something more elegant.  Like using ce_flags,
or comparing mtime values.

...

Ok taking a quick look at write-tree to see where
the bottle neck is:  

I made two modified versions of write-tree. 
- git-write-tree-nowritetree which calls return just before calling
    write_tree.
- git-write-tree-nosha1write which does everything except call
    sha1_file_write.

With just git-apply and git-write-tree-nosha1write it takes
me about 3m:20s to process 2.6.17-rc1-mm2.

With just git-apply and git-write-tree-nowritetree it takes:
real    2m59.985s
user    1m38.353s
sys     0m31.445s

With just git-apply and /bin/true it takes:
real    2m1.581s
user    1m3.169s
sys     0m29.903s


Looking at the individual numbers:
$ time git-write-tree-nowritetree --missing-ok

real    0m0.158s
user    0m0.052s
sys     0m0.008s
$ time git-write-tree-nowritetree --missing-ok

real    0m0.155s
user    0m0.057s
sys     0m0.003s
$ time git-write-tree-nowritetree --missing-ok 
real    0m0.065s
user    0m0.057s
sys     0m0.002s
$ time git-write-tree-nowritetree --missing-ok

real    0m0.159s
user    0m0.055s
sys     0m0.005s
$ time git-write-tree-nowritetree --missing-ok

real    0m0.151s
user    0m0.054s
sys     0m0.007s
$ time git-write-tree-nowritetree --missing-ok

real    0m0.154s
user    0m0.056s
sys     0m0.005s

$ time git-write-tree-nosha1write --missing-ok
0000000000000000000000000000000000000000

real    0m0.199s
user    0m0.091s
sys     0m0.008s
$ time git-write-tree-nosha1write --missing-ok
0000000000000000000000000000000000000000

real    0m0.195s
user    0m0.094s
sys     0m0.007s
$ time git-write-tree-nosha1write --missing-ok
0000000000000000000000000000000000000000

real    0m0.198s
user    0m0.092s
sys     0m0.009s

$ time git-write-tree --missing-ok
0ecfe3dbc2e65aa9638c62abf0cf05057c77f884

real    0m0.217s
user    0m0.113s
sys     0m0.012s

$ time git-write-tree
0ecfe3dbc2e65aa9638c62abf0cf05057c77f884

real    0m0.276s
user    0m0.169s
sys     0m0.008s

So at a quick inspection it looks to me like:
About .059s to perform to check for missing files.
About .019s to write the new tree.
About .155s in start up overhead, read_cache, and sanity checks.

So at a first glance it looks like librification to
allow the redundant work to be skipped, is where
the big speed win on my machine would be.

> Your patch looks ok from a quick read-through:

Thanks.

My import of 2.6.17-rc1-mm2 gives exactly the same
result as simply applying Andrews patch.  Which while
not definitive hits a lot of interesting cases.

> Acked-by: Linus Torvalds <torvalds@xxxxxxxx>
>
> 		Linus
-
: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]