Re: [PATCH 2/2] line-log: avoid unnecessary full tree diffs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 8/21/2019 1:35 PM, SZEDER Gábor wrote:
> On Wed, Aug 21, 2019 at 11:53:28AM -0400, Derrick Stolee wrote:
>> On 8/21/2019 7:04 AM, SZEDER Gábor wrote:
>>> With rename detection enabled the line-level log is able to trace the
>>> evolution of line ranges across whole-file renames [1].  Alas, to
>>> achieve that it uses the diff machinery very inefficiently, making the
>>> operation very slow [2].  And since rename detection is enabled by
>>> default, the line-level log is very slow by default.
>>>
>>> When the line-level log processes a commit with rename detection
>>> enabled, it currently does the following (see queue_diffs()):
>>>
>>>   1. Computes a full tree diff between the commit and (one of) its
>>>      parent(s), i.e. invokes diff_tree_oid() with an empty
>>>      'diffopt->pathspec'.
>>>   2. Checks whether any paths in the line ranges were modified.
>>>   3. Checks whether any modified paths in the line ranges are missing
>>>      in the parent commit's tree.
>>>   4. If there is such a missing path, then calls diffcore_std() to
>>>      figure out whether the path was indeed renamed based on the
>>>      previously computed full tree diff.
>>>   5. Continues doing stuff that are unrelated to the slowness.
>>>
>>> So basically the line-level log computes a full tree diff for each
>>> commit-parent pair in step (1) to be used for rename detection in step
>>> (4) in the off chance that an interesting path is missing from the
>>> parent.
>>>
>>> Avoid these expensive and mostly unnecessary full tree diffs by
>>> limiting the diffs to paths in the line ranges.  This is much cheaper,
>>> and makes step (2) unnecessary.  If it turns out that an interesting
>>> path is missing from the parent, then fall back and compute a full
>>> tree diff, so the rename detection will still work.
>>
>> I applied your patches and tried them on our VFS-enabled version of Git
>> (see [1]). Unfortunately, the new logic is still triggering rename
>> detection, as measured by the number of objects being downloaded.
> 
> Well, the goal of this patch was to avoid full tree diffs if possible,
> not to avoid rename detection :)
> 
> Anyway, I wonder how does 'git log -L1:your-evil-path --no-renames'
> fare as a baseline?

Yeah, adding --no-renames does really well, comparatively. Perhaps I'll
just recommend to users to use that flag for now.

Thanks,
-Stolee




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux