On Sun, Aug 12, 2018 at 1:16 AM Nguyễn Thái Ngọc Duy <pclouds@xxxxxxxxx> wrote: > > We do n-way merge by walking the source index and n trees at the same > time and add merge results to a new temporary index called o->result. > The merge result for any given path could be either > > - keep_entry(): same old index entry in o->src_index is reused > - merged_entry(): either a new entry is added, or an existing one updated > - deleted_entry(): one entry from o->src_index is removed > > For some reason [1] we keep making sure that the source index's > cache-tree is still valid if used by o->result: for all those > merged/deleted entries, we invalidate the same path in o->src_index, > so only cache-trees covering the "keep_entry" parts remain good. > > Because of this, the cache-tree from o->src_index can be perfectly > reused in o->result. And in fact we already rely on this logic to > reuse untracked cache in edf3b90553 (unpack-trees: preserve index > extensions - 2017-05-08). Move the cache-tree to o->result before > doing cache_tree_update() to reduce hashing cost. > > Since cache_tree_update() has risen up as one of the most expensive > parts in unpack_trees() after the last few patches. This does help > reduce unpack_trees() time significantly (on webkit.git): > > before after > -------------------------------------------------------------------- > 0.080394752 0.051258167 s: read cache .git/index > 0.216010838 0.212106298 s: preload index > 0.008534301 0.280521764 s: refresh index > 0.251992198 0.218160442 s: traverse_trees > 0.377031383 0.374948191 s: check_updates > 0.372768105 0.037040114 s: cache_tree_update > 1.045887251 0.672031609 s: unpack_trees Cool, nice drop in both cache_tree_update() and unpack_trees(). But why did refresh_index() go up so much? That should have been unaffected by this patch to, so it seems like something odd is going on. Any ideas?