Re: Optimizing writes to unchanged files during merges?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 16/04/18 17:07, Lars Schneider wrote:
> 
> I am happy to see this discussion and the patches, because long rebuilds 
> are a constant annoyance for us. We might have been bitten by the exact 
> case discussed here, but more often, we have a slightly different 
> situation:
> 
> An engineer works on a task branch and runs incremental builds — all 
> is good. The engineer switches to another branch to review another 
> engineer's work. This other branch changes a low-level header file, 
> but no rebuild is triggered. The engineer switches back to the previous 
> task branch. At this point, the incremental build will rebuild 
> everything, as the compiler thinks that the low-level header file has
> been changed (because the mtime is different).
> 
> Of course, this problem can be solved with a separate worktree. However, 
> our engineers forget about that sometimes, and then, they are annoyed by 
> a 4h rebuild.
> 
> Is this situation a problem for others too?
> If yes, what do you think about the following approach:
> 
> What if Git kept a LRU list that contains file path, content hash, and 
> mtime of any file that is removed or modified during a checkout. If a 
> file is checked out later with the exact same path and content hash, 
> then Git could set the mtime to the previous value. This way the 
> compiler would not think that the content has been changed since the 
> last rebuild.

Hi Lars

But if there has been rebuild between the checkouts then you
want the compiler to rebuild. I've been using the script below
recently to save and restore mtimes around running rebase to squash
fixup commits. To avoid restoring the mtimes if there has been a
rebuild since they were stored it takes a list of build sentinels and
stores their mtimes too - if any of those change then it will refuse
to restore the original mtimes of the tracked files (if you give a
path that does exist when the mtimes are stored then it will refuse to
restore the mtimes if that path exists when you run 'git mtimes
restore'). The sentinels can be specified on the commandline when
running 'git mtimes save' or stored in multiple mtimes.sentinal config
keys.

Best Wishes

Phillip

--->8---

#!/usr/bin/perl

# Copyright (C) 2018 Phillip Wood <phillip.wood@xxxxxxxxxxxxx>
#
# git-mtimes.perl
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, see <http://www.gnu.org/licenses/>.

use 5.008;
use strict;
use warnings;
use File::Copy (qw(copy));
use File::Spec::Functions (qw(abs2rel catfile file_name_is_absolute rel2abs));
use Storable ();

sub git;

my $GIT_DIR = git(qw(rev-parse --git-dir)) or exit 1;
$GIT_DIR = rel2abs($GIT_DIR);
my $mtimes_path = "$GIT_DIR/mtimes";

sub git {
    my @lines;
    # in a scalar context slurp removing any trailing $/
    # in an array context return a list of lines
    {
	local $/ = wantarray ? $/ : undef;
	local $,=' ';
	open my $fh, '-|', 'git', @_ or die "git @_ failed $!";
	@lines = <$fh>;
	chomp @lines;
	unless (close $fh) {
	    $? == -1 and die "git @_ not found";
	    my $code = $? >> 8;
	    $_[0] eq 'config' and $code == 1 or
		die "git @_ failed with exit code $code"
	}
    }
    wantarray and return @lines;
    @lines and chomp @lines;
    return $lines[0];
}

sub ls_files {
    # mode, uid, gid, mtime and maybe atime
    my @stat_indices = $_[0] ? (2, 4, 5, 9, 8) : (2, 4, 5, 9);
    local $_;
    local $/ = "\0";
    my @files;
    for (git(qw(ls-files --stage -z))) {
	if (/^[^ ]+ ([^\t]+) 0\t(.*)/) {
	    my @stat = stat($2);
	    # store name, hash, mode, uid, gid, mtime and maybe atime
	    push @files, [ $2, $1, @stat[@stat_indices] ];
	}
    }
    return @files;
}

sub get_config {
    local $/ = "\0";
    my $get = wantarray ? '--get-all' : '--get';
    git(qw(config -z), $get, @_);
}

sub save {
    local $_;
    my @sentinels = get_config('mtimes.sentinel');
    push @sentinels, @ARGV;
    @sentinels or die "No sentinels given";
    @sentinels = map { [ $_, [ stat $_ ] ] } @sentinels;
    my @files = ls_files();
    Storable::nstore( [ [ @sentinels ] , [ @files ] ], $mtimes_path) or
	die "unable to store mtimes $!";
}

sub match_sentinel_data {
    local $_;
    my ($old, $new, $trustctime) = @_;
    if (!@$old) {
	return (@$new) ? undef : 1;
    } else {
	@$new or return undef;
    }
    # Skip hardlink count, atime, blksize
    for (0..2,4..7,9,10,12) {
	next if ($_ == 10 and ! $trustctime);
	$old->[$_] == $new->[$_] or return undef;
    }
    return 1;
}

sub needs_update {
    local $_;
    my ($old, $new) = @_;
    for (0..1) {
	$old->[$_] eq $new->[$_] or return undef;
    }
    for (2..4) {
	$old->[$_] == $new->[$_] or return undef;
    }
    $old->[5] != $new->[5];
}

sub restore {
    local $_;
    my $stored = Storable::retrieve($mtimes_path) or
	die "unable to load stored data";
    my $trustctime = get_config('--bool', 'core.trustctime');
    $trustctime = defined($trustctime) ? $trustctime eq 'true' : 1;
    my ($sentinels, $oldfiles) = @$stored;
    for (@$sentinels) {
	match_sentinel_data( [ stat($_->[0]) ], $_->[1], $trustctime) or
	    die "Unable to restore mtimes, stat data for sentinel '$_->[0]' does not match";
    }
    my @newfiles = ls_files(1);
    my ($i, $restored) = (0, 0);
    for (@$oldfiles) {
	while ($newfiles[$i]->[0] lt $_->[0] and $i < @newfiles) {
	    $i++;
	}
	if (needs_update($_, $newfiles[$i])) {
	    utime($newfiles[$i]->[6], $_->[5], $_->[0]);
	    $restored = 1;
	}
    }
    if ($restored) {
	print "restored mtimes\n";
    }
}

my $cmd = shift;
# Keep relative paths relative in case repository directory is renamed
# between saving and restoring mtimes.
if ($ENV{GIT_PREFIX}) {
    @ARGV = map {
	file_name_is_absolute($_) ? $_ : catfile($ENV{GIT_PREFIX}, $_);
    } @ARGV;
}
my $up = git(qw(rev-parse --show-cdup));
if ($up) {
    @ARGV = map {
	file_name_is_absolute($_) ? $_ : abs2rel(rel2abs($_), $up);
    } @ARGV;
    chdir $up;
}

my $tmp_index = catfile($GIT_DIR, "mtimes-index");
my $src_index = $ENV{GIT_INDEX_FILE} ? $ENV{GIT_INDEX_FILE} :
				       catfile($GIT_DIR, "index");
copy($src_index, $tmp_index) or
    die "cannot create temporary index '$tmp_index'\n";
$ENV{GIT_INDEX_FILE} = $tmp_index;
git(qw(add -u));

if ($cmd eq 'save') {
    save();
} elsif ($cmd eq 'restore' and ! @ARGV) {
    restore();
} else {
    print STDERR "usage: git mtimes <save [sentinels ...] | restore>\n";
}

END {
    unlink $tmp_index;
}

> I think that would fix the problem that our engineers run into and also 
> the problem that Linus experienced during the merge, wouldn't it?
> 
> Thanks,
> Lars
> 




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux