Re: [PATCH v4 5/5] unpack-trees: reuse (still valid) cache-tree from src_index

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 8/13/2018 11:48 AM, Elijah Newren wrote:
On Sun, Aug 12, 2018 at 1:16 AM Nguyễn Thái Ngọc Duy <pclouds@xxxxxxxxx> wrote:

We do n-way merge by walking the source index and n trees at the same
time and add merge results to a new temporary index called o->result.
The merge result for any given path could be either

- keep_entry(): same old index entry in o->src_index is reused
- merged_entry(): either a new entry is added, or an existing one updated
- deleted_entry(): one entry from o->src_index is removed

For some reason [1] we keep making sure that the source index's
cache-tree is still valid if used by o->result: for all those
merged/deleted entries, we invalidate the same path in o->src_index,
so only cache-trees covering the "keep_entry" parts remain good.

Because of this, the cache-tree from o->src_index can be perfectly
reused in o->result. And in fact we already rely on this logic to
reuse untracked cache in edf3b90553 (unpack-trees: preserve index
extensions - 2017-05-08). Move the cache-tree to o->result before
doing cache_tree_update() to reduce hashing cost.

Since cache_tree_update() has risen up as one of the most expensive
parts in unpack_trees() after the last few patches. This does help
reduce unpack_trees() time significantly (on webkit.git):

     before       after
   --------------------------------------------------------------------
     0.080394752  0.051258167 s:  read cache .git/index
     0.216010838  0.212106298 s:  preload index
     0.008534301  0.280521764 s:  refresh index
     0.251992198  0.218160442 s:   traverse_trees
     0.377031383  0.374948191 s:   check_updates
     0.372768105  0.037040114 s:   cache_tree_update
     1.045887251  0.672031609 s:  unpack_trees

Cool, nice drop in both cache_tree_update() and unpack_trees().  But
why did refresh_index() go up so much?  That should have been
unaffected by this patch to, so it seems like something odd is going
on.  Any ideas?


I was part way through writing a patch that would copy the valid parts of the cache-tree from the source index to the dest index but the observation that the source index cache tree was already being invalidated properly which allows the simple pointer "copy" is much better!

I run some tests on a large repo and the results look very promising.

base	new	diff	% saved	
0.55	0.52	0.02	4.32%	s:  read cache .git/index
0.31	0.30	0.01	2.98%	s:  initialize name hash
0.03	0.02	0.00	9.98%	s:  preload index
0.09	0.09	0.00	4.86%	s:  refresh index
5.93	1.19	4.74	79.95%	s:   traverse_trees
0.12	0.13	-0.01	-4.15%	s:   check_updates
2.14	0.00	2.14	100.00%	s:   cache_tree_update
10.63	4.29	6.33	59.59%	s:  unpack_trees
0.97	0.91	0.06	6.41%	s:  write index, changed mask = 28
3.49	0.18	3.31	94.91%	s:    traverse_trees
0.00	0.00	0.00	17.53%	s:    check_updates
3.61	0.30	3.31	91.77%	s:   unpack_trees
3.61	0.30	3.31	91.77%	s:  diff-index
17.28	8.36	8.92	51.62%	s: git command: c:git.exe checkout

Same methodology as before, I ran "git checkout" 5 times, threw away the first 2 runs and averaged the last 3. I entered 0 for the "new" cache_tree_update line as it no longer reports anything.



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux