On Jan 20, 2008, at 7:20 PM, Linus Torvalds wrote:
On Sun, 20 Jan 2008, Steffen Prohaska wrote:
So your test-case does have relatively more overhead (compared to
what got
optimized) than the numbers I quoted.
I cloned the kernel tree and measured these numbers:
Best time before:
$ time git commit >/dev/null
real 0m1.487s
user 0m0.786s
sys 0m0.696s
Best time after:
$ time git commit >/dev/null
real 0m1.087s
user 0m0.578s
sys 0m0.508s
We also do know that while Linux has a very low-overhead lstat(),
the real
problem on OS X has been the memory management, not the filesystem.
We had
some case where the page fault overhead was something like two
orders of
magnitude bigger, didn't we?
(Yeah - just checked. Commit 6d2fa7f1b489c65e677c18eda5c144dbc5d614ab:
"index-pack usage of mmap() is unacceptably slower.."). That took a
minute
on Linux, and an hour on OS X)
It would be good to have a system-level profile of something like
this. On
Linux, it's now mostly in user space with "git commit", and oprofile
shows:
[...]
I used Mac OS X's Sampler to collect some statistics.
It took me some time to find out about and work around an
annoying limitation. Sampler is not capable of simply sampling a
full run of a process but needs to be manually stopped before the
process terminates. You literally have to push a button before
the process exits--what a crazy tool. I worked around this by
adding a sleep(100) instruction. Sampler allows you to prune the
samples collected in mach_wait_until() and the relative numbers
should be fine.
Here is what I have.
Before Junio's and your patch, lstat dominates with 25%:
# Report - Timed Samples for git-commit
# Samples < 20 have been filtered
# Generated from the visible portion of the outline view
+ 25% lstat
| + 25% lstat
| | - 17% wt_status_print
| | - 8.4% refresh_index
+ 16% getdirentries
| + 16% getdirentries
| | - 16% read_directory_recursive
+ 10% fnmatch
| + 10% fnmatch
| | + 10% excluded
| | | - 10% read_directory_recursive
+ 9.7% sha1_block_data_order
| + 9.7% sha1_block_data_order
| | + 9.7% SHA1_Update
| | | + 8.0% read_index_from
| | | | - 6.8% wt_read_cache
| | | - 1.6% ce_write
+ 8.6% cache_name_compare
| + 8.6% cache_name_compare
| | - 6.9% cmp_cache_name_compare
| | - 1.6% index_name_pos
- 5.2% __mmap
- 3.2% open
+ 2.0% excluded
| + 2.0% excluded
| | - 2.0% read_directory_recursive
- 1.6% mbrtowc_l
- 1.5% find_pack_entry_one
- 1.4% cmp_cache_name_compare
- 1.3% qsort
After your patches getdirentries and fnmatch dominate:
# Report - Timed Samples for git-commit
# Samples < 20 have been filtered
# Generated from the visible portion of the outline view
+ 29% getdirentries
| + 29% getdirentries
| | - 29% read_directory_recursive
+ 19% fnmatch
| + 19% fnmatch
| | + 19% excluded
| | | - 19% read_directory_recursive
+ 9.6% lstat
| + 9.6% lstat
| | - 9.5% refresh_index
+ 5.5% open
| + 5.5% open
| | + 3.9% excluded
| | | - 3.9% read_directory_recursive
| | - 1.6% read_directory_recursive
+ 3.9% cache_name_compare
| + 3.9% cache_name_compare
| | - 3.8% index_name_pos
+ 3.0% sha1_block_data_order
| + 3.0% sha1_block_data_order
| | + 3.0% SHA1_Update
| | | - 1.6% read_index_from
+ 2.8% excluded
| + 2.8% excluded
| | - 2.8% read_directory_recursive
- 2.5% __memcpy
- 1.9% strcmp
- 1.6% close
- 1.4% fnmatch1
The reports above were created with Sampler's "Invert call tree"
option, while the next one is generated without this option.
# Report - Timed Samples for git-commit
# Samples < 20 have been filtered
# Generated from the visible portion of the outline view
+ 85% run_status
| + 85% wt_status_print
| | + 73% read_directory
| | | + 73% read_directory_recursive
| | | | + 73% read_directory_recursive
| | | | | + 71% read_directory_recursive
| | | | | | + 44% read_directory_recursive
| | | | | | | + 15% excluded
| | | | | | | | - 12% fnmatch
| | | | | | | | - 1.6% open
| | | | | | | - 12% getdirentries
| | | | | | | + 10% read_directory_recursive
| | | | | | | | - 4.0% getdirentries
| | | | | | | | - 2.6% excluded
| | | | | | | | 1.7% read_directory_recursive
| | | | | | | - 2.6% dir_add_name
| | | | | | + 14% excluded
| | | | | | | - 11% fnmatch
| | | | | | - 8.0% getdirentries
| | | | | | - 1.6% dir_add_name
| | - 11% run_diff_index
+ 15% prepare_index
| + 10% refresh_index
| | - 9.7% lstat
| - 2.5% write_index
| - 1.9% read_index
With this kind of report you can see how the running time is
distributed over the different functions called by cmd_commit().
and one thing to look out for would be that some of these things
might be
relatively much more costly on other systems.
fnmatch? It's certainly visible on Linux, I wonder if the Windows/OSX
version is more expensive due to trying to be case-insensitive or
something?
So roughly 50% of the running time is spent in getdirentries
and fnmatch on the MacBook Pro I used to run these tests.
Steffen
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html