Re: Optimizing writes to unchanged files during merges?

Jacob Keller <jacob.keller@xxxxxxxxx> · Mon, 16 Apr 2018 10:43:33 -0700

On Mon, Apr 16, 2018 at 9:07 AM, Lars Schneider
<larsxschneider@xxxxxxxxx> wrote:
>
>> On 16 Apr 2018, at 04:03, Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
>>
>> On Sun, Apr 15, 2018 at 6:44 PM, Junio C Hamano <gitster@xxxxxxxxx> wrote:
>>>
>>> I think Elijah's corrected was_tracked() also does not care "has
>>> this been renamed".
>>
>> I'm perfectly happy with the slightly smarter patches. My patch was
>> really just an RFC and because I had tried it out.
>>
>>> One thing that makes me curious is what happens (and what we want to
>>> happen) when such a "we already have the changes the side branch
>>> tries to bring in" path has local (i.e. not yet in the index)
>>> changes.  For a dirty file that trivially merges (e.g. a path we
>>> modified since our histories forked, while the other side didn't do
>>> anything, has local changes in the working tree), we try hard to
>>> make the merge succeed while keeping the local changes, and we
>>> should be able to do the same in this case, too.
>>
>> I think it might be nice, but probably not really worth it.
>>
>> I find the "you can merge even if some files are dirty" to be really
>> convenient, because I often keep stupid test patches in my tree that I
>> may not even intend to commit, and I then use the same tree for
>> merging.
>>
>> For example, I sometimes end up editing the Makefile for the release
>> version early, but I won't *commit* that until I actually cut the
>> release. But if I pull some branch that has also changed the Makefile,
>> it's not worth any complexity to try to be nice about the dirty state.
>>
>> If it's a file that actually *has* been changed in the branch I'm
>> merging, and I'm more than happy to just stage the patch (or throw it
>> away - I think it's about 50:50 for me).
>>
>> So I don't think it's a big deal, and I'd rather have the merge fail
>> very early with "that file has seen changes in the branch you are
>> merging" than add any real complexity to the merge logic.
>
> I am happy to see this discussion and the patches, because long rebuilds
> are a constant annoyance for us. We might have been bitten by the exact
> case discussed here, but more often, we have a slightly different
> situation:
>
> An engineer works on a task branch and runs incremental builds — all
> is good. The engineer switches to another branch to review another
> engineer's work. This other branch changes a low-level header file,
> but no rebuild is triggered. The engineer switches back to the previous
> task branch. At this point, the incremental build will rebuild
> everything, as the compiler thinks that the low-level header file has
> been changed (because the mtime is different).
>
> Of course, this problem can be solved with a separate worktree. However,
> our engineers forget about that sometimes, and then, they are annoyed by
> a 4h rebuild.
>
> Is this situation a problem for others too?
> If yes, what do you think about the following approach:
>
> What if Git kept a LRU list that contains file path, content hash, and
> mtime of any file that is removed or modified during a checkout. If a
> file is checked out later with the exact same path and content hash,
> then Git could set the mtime to the previous value. This way the
> compiler would not think that the content has been changed since the
> last rebuild.

That would only work until they actuall *did* a build on the second
branch, and upon changing back, how would this detect that it needs to
update mtime again? I don't think this solution really works.
Ultimately, the problem is that the build tool relies on the mtime to
determine what to rebuild. I think this would cause worse problems
because we *wouldn't* rebuild in the case. How is git supposed to know
that we rebuilt when switching branches or not?

Thanks,
Jake

>
> I think that would fix the problem that our engineers run into and also
> the problem that Linus experienced during the merge, wouldn't it?
>
> Thanks,
> Lars