On 8/21/2019 1:35 PM, SZEDER Gábor wrote: > On Wed, Aug 21, 2019 at 11:53:28AM -0400, Derrick Stolee wrote: >> On 8/21/2019 7:04 AM, SZEDER Gábor wrote: >>> With rename detection enabled the line-level log is able to trace the >>> evolution of line ranges across whole-file renames [1]. Alas, to >>> achieve that it uses the diff machinery very inefficiently, making the >>> operation very slow [2]. And since rename detection is enabled by >>> default, the line-level log is very slow by default. >>> >>> When the line-level log processes a commit with rename detection >>> enabled, it currently does the following (see queue_diffs()): >>> >>> 1. Computes a full tree diff between the commit and (one of) its >>> parent(s), i.e. invokes diff_tree_oid() with an empty >>> 'diffopt->pathspec'. >>> 2. Checks whether any paths in the line ranges were modified. >>> 3. Checks whether any modified paths in the line ranges are missing >>> in the parent commit's tree. >>> 4. If there is such a missing path, then calls diffcore_std() to >>> figure out whether the path was indeed renamed based on the >>> previously computed full tree diff. >>> 5. Continues doing stuff that are unrelated to the slowness. >>> >>> So basically the line-level log computes a full tree diff for each >>> commit-parent pair in step (1) to be used for rename detection in step >>> (4) in the off chance that an interesting path is missing from the >>> parent. >>> >>> Avoid these expensive and mostly unnecessary full tree diffs by >>> limiting the diffs to paths in the line ranges. This is much cheaper, >>> and makes step (2) unnecessary. If it turns out that an interesting >>> path is missing from the parent, then fall back and compute a full >>> tree diff, so the rename detection will still work. >> >> I applied your patches and tried them on our VFS-enabled version of Git >> (see [1]). Unfortunately, the new logic is still triggering rename >> detection, as measured by the number of objects being downloaded. > > Well, the goal of this patch was to avoid full tree diffs if possible, > not to avoid rename detection :) > > Anyway, I wonder how does 'git log -L1:your-evil-path --no-renames' > fare as a baseline? Yeah, adding --no-renames does really well, comparatively. Perhaps I'll just recommend to users to use that flag for now. Thanks, -Stolee