On Mon, Aug 13, 2018 at 5:48 PM Elijah Newren <newren@xxxxxxxxx> wrote: > > On Sun, Aug 12, 2018 at 1:16 AM Nguyễn Thái Ngọc Duy <pclouds@xxxxxxxxx> wrote: > > > > We do n-way merge by walking the source index and n trees at the same > > time and add merge results to a new temporary index called o->result. > > The merge result for any given path could be either > > > > - keep_entry(): same old index entry in o->src_index is reused > > - merged_entry(): either a new entry is added, or an existing one updated > > - deleted_entry(): one entry from o->src_index is removed > > > > For some reason [1] we keep making sure that the source index's > > cache-tree is still valid if used by o->result: for all those > > merged/deleted entries, we invalidate the same path in o->src_index, > > so only cache-trees covering the "keep_entry" parts remain good. > > > > Because of this, the cache-tree from o->src_index can be perfectly > > reused in o->result. And in fact we already rely on this logic to > > reuse untracked cache in edf3b90553 (unpack-trees: preserve index > > extensions - 2017-05-08). Move the cache-tree to o->result before > > doing cache_tree_update() to reduce hashing cost. > > > > Since cache_tree_update() has risen up as one of the most expensive > > parts in unpack_trees() after the last few patches. This does help > > reduce unpack_trees() time significantly (on webkit.git): > > > > before after > > -------------------------------------------------------------------- > > 0.080394752 0.051258167 s: read cache .git/index > > 0.216010838 0.212106298 s: preload index > > 0.008534301 0.280521764 s: refresh index > > 0.251992198 0.218160442 s: traverse_trees > > 0.377031383 0.374948191 s: check_updates > > 0.372768105 0.037040114 s: cache_tree_update > > 1.045887251 0.672031609 s: unpack_trees > > Cool, nice drop in both cache_tree_update() and unpack_trees(). But > why did refresh_index() go up so much? That should have been > unaffected by this patch to, so it seems like something odd is going > on. Any ideas? Probably fs cache and stuff. This is a laptop with just 4GB RAM and a very slow disk so if something triggers in the background and evicts some webkit.git's stat info, refresh_index will get hot fast (and with 275k files, webkit.git needs quite a bit of ram to make sure stat() calls don't hit the disk). -- Duy