Re: Optimizing writes to unchanged files during merges?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> On 16 Apr 2018, at 19:04, Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> wrote:
> 
> 
> On Mon, Apr 16 2018, Lars Schneider wrote:
> 
>>> On 16 Apr 2018, at 04:03, Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
>>> 
>>> On Sun, Apr 15, 2018 at 6:44 PM, Junio C Hamano <gitster@xxxxxxxxx> wrote:
>>>> 
>>>> I think Elijah's corrected was_tracked() also does not care "has
>>>> this been renamed".
>>> 
>>> I'm perfectly happy with the slightly smarter patches. My patch was
>>> really just an RFC and because I had tried it out.
>>> 
>>>> One thing that makes me curious is what happens (and what we want to
>>>> happen) when such a "we already have the changes the side branch
>>>> tries to bring in" path has local (i.e. not yet in the index)
>>>> changes.  For a dirty file that trivially merges (e.g. a path we
>>>> modified since our histories forked, while the other side didn't do
>>>> anything, has local changes in the working tree), we try hard to
>>>> make the merge succeed while keeping the local changes, and we
>>>> should be able to do the same in this case, too.
>>> 
>>> I think it might be nice, but probably not really worth it.
>>> 
>>> I find the "you can merge even if some files are dirty" to be really
>>> convenient, because I often keep stupid test patches in my tree that I
>>> may not even intend to commit, and I then use the same tree for
>>> merging.
>>> 
>>> For example, I sometimes end up editing the Makefile for the release
>>> version early, but I won't *commit* that until I actually cut the
>>> release. But if I pull some branch that has also changed the Makefile,
>>> it's not worth any complexity to try to be nice about the dirty state.
>>> 
>>> If it's a file that actually *has* been changed in the branch I'm
>>> merging, and I'm more than happy to just stage the patch (or throw it
>>> away - I think it's about 50:50 for me).
>>> 
>>> So I don't think it's a big deal, and I'd rather have the merge fail
>>> very early with "that file has seen changes in the branch you are
>>> merging" than add any real complexity to the merge logic.
>> 
>> I am happy to see this discussion and the patches, because long rebuilds
>> are a constant annoyance for us. We might have been bitten by the exact
>> case discussed here, but more often, we have a slightly different
>> situation:
>> 
>> An engineer works on a task branch and runs incremental builds — all
>> is good. The engineer switches to another branch to review another
>> engineer's work. This other branch changes a low-level header file,
>> but no rebuild is triggered. The engineer switches back to the previous
>> task branch. At this point, the incremental build will rebuild
>> everything, as the compiler thinks that the low-level header file has
>> been changed (because the mtime is different).
>> 
>> Of course, this problem can be solved with a separate worktree. However,
>> our engineers forget about that sometimes, and then, they are annoyed by
>> a 4h rebuild.
>> 
>> Is this situation a problem for others too?
>> If yes, what do you think about the following approach:
>> 
>> What if Git kept a LRU list that contains file path, content hash, and
>> mtime of any file that is removed or modified during a checkout. If a
>> file is checked out later with the exact same path and content hash,
>> then Git could set the mtime to the previous value. This way the
>> compiler would not think that the content has been changed since the
>> last rebuild.
>> 
>> I think that would fix the problem that our engineers run into and also
>> the problem that Linus experienced during the merge, wouldn't it?
> 
> Could what you're describing be prototyped as a post-checkout hook that
> looks at the reflog? It sounds to me like it could, but perhaps I've
> missed some subtlety.

Yeah, probably. You don't even need the reflog I think. I just wanted
to get a sense if other people run into this problem too.


> Not re-writing out a file that hasn't changed is one thing, but I think
> for more complex behaviors (such as the "I want everything to have the
> same mtime" mentioned in another thread on-list), and this, it makes
> sense to provide some hook mechanism than have git itself do all the
> work.
> 
> I also don't see how what you're describing could be generalized, or
> even be made to work reliably in the case you're describing. If the
> engineer runs "make" on this branch he's testing out that might produce
> an object file that'll get used as-is once he switches back, since
> you've set the mtime in the past for that file because you re-checked it
> out.

Ohh... you're right. I thought Visual Studio looks *just* at ctime/mtime 
of the files. But this seems not to be true [1]:
 
   "MSBuild to build it quickly checks if any of a project’s input files 
    are modified later than any of the project’s outputs"

In that case my idea outlined above is garbage.

Thanks,
Lars


[1] https://blogs.msdn.microsoft.com/kirillosenkov/2014/08/04/how-to-investigate-rebuilding-in-visual-studio-when-nothing-has-changed/



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux