Re: [RFC PATCH] checkout: Force matching mtime between files

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Apr 27, 2018 at 7:18 PM, Michał Górny <mgorny@xxxxxxxxxx> wrote:
> W dniu śro, 25.04.2018 o godzinie 11∶18 -0400, użytkownik Marc Branchaud
> napisał:
>> On 2018-04-25 04:48 AM, Junio C Hamano wrote:
>> > "Robin H. Johnson" <robbat2@xxxxxxxxxx> writes:
>> >
>> > > In the thread from 6 years ago, you asked about tar's behavior for
>> > > mtimes. 'tar xf' restores mtimes from the tar archive, so relative
>> > > ordering after restore would be the same, and would only rebuild if the
>> > > original source happened to be dirty.
>> > >
>> > > This behavior is already non-deterministic in Git, and would be improved
>> > > by the patch.
>> >
>> > But Git is not an archiver (tar), but is a source code control
>> > system, so I do not think we should spend any extra cycles to
>> > "improve" its behaviour wrt the relative ordering, at least for the
>> > default case.  Only those who rely on having build artifact *and*
>> > source should pay the runtime (and preferrably also the
>> > maintainance) cost.
>>
>> Anyone who uses "make" or some other mtime-based tool is affected by
>> this.  I agree that it's not "Everyone" but it sure is a lot of people.
>>
>> Are we all that sure that the performance hit is that drastic?  After
>> all, we've just done write_entry().  Calling utime() at that point
>> should just hit the filesystem cache.
>>
>> > The best approach to do so is to have those people do the "touch"
>> > thing in their own post-checkout hook.  People who use Git as the
>> > source control system won't have to pay runtime cost of doing the
>> > touch thing, and we do not have to maintain such a hook script.
>> > Only those who use the "feature" would.
>>
>> The post-checkout hook approach is not exactly straightforward.
>>
>> Naively, it's simply
>>
>>       for F in `git diff --name-only $1 $2`; do touch "$F"; done
>>
>> But consider:
>>
>> * Symlinks can cause the wrong file to be touched.  (Granted, Michał's
>> proposed patch also doesn't deal with symlinks.)  Let's assume that a
>> hook can be crafted will all possible sophistication.  There are still
>> some fundamental problems:
>>
>> * In a "file checkout" ("git checkout -- path/to/file"), $1 and $2 are
>> identical so the above loop does nothing.  Offhand I'm not even sure how
>> a hook might get the right files in this case.
>>
>> * The hook has to be set up in every repo and submodule (at least until
>> something like Ævar's experiments come to fruition).
>>
>> * A fresh clone can't run the hook.  This is especially important when
>> dealing with submodules.  (In one case where we were bit by this, make
>> though that half of a fresh submodule clone's files were stale, and
>> decided to re-autoconf the entire thing.)
>>
>>
>> I just don't think the hook approach can completely solve the problem.
>>
>
> There's also the performance aspect.  If we deal with checkouts that
> include 1000+ files on a busy system (i.e. when mtimes really become
> relevant), calling utime() instantly has a good chance of hitting warm
> cache.  On the other hand, post-checkout hook has a greater risk of
> running cold cache, i.e. writing to all inodes twice.

The FS cache is evicted on a LRU basis. What you're saying is true,
but in the two different implementations there's maybe a 2-3 second
gap between what git is doing and the post-checkout hook is doing. If
the system is under such memory pressure that you've evicted the pages
you just touched you're probably screwed anyway. Maybe I've missed
something here, but this point seems moot.

There's certainly other good arguments against using the current hook
implementation for this, e.g. not being able to do this on clone as
noted upthread.

I think patches that made this configurable in some way in git would
be worth looking at, and due to the subject matter it might make sense
to have it in the core distribution as a non-hook, but I think the
default behavior should always be what it is now, since almost nobody
cares about these edge case,s and users should have to opt-in to use
behavior to work around them.




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux