Re: Refreshing index timestamps without reading content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Jan 5, 2017 at 6:23 PM, Quentin Casasnovas
<quentin.casasnovas@xxxxxxxxxx> wrote:
> Hi guys,
>
> Apologies if this is documented somewhere, I have fairly bad search vudu
> skills.
>
> I'm looking for a way to cause a full refresh of the index without causing
> any read of the files, basically telling git "trust me, all worktree files
> are matching the index, but their stat information have changed".  I have
> read about the update-index --assume-unchanged and --skip-worktree flags in
> the documentation, but these do not cause any index refresh - rather, they
> fake that the respective worktree files are matching the index until you
> remove those assume-unchanged/skip-worktree bits.
>
> This might sound like a really weird thing to do, but I do have a use case
> for it - we have some build farm setup where the resulting objects of a
> compilation are stored on a shared server.  The source files are not stored
> on the shared server, but locally on each of the build server (as to
> decrease network load and make good use of local storage as caches).
>
> We then use an onion filesystem to mount the compiled objects on top of the
> local sources - and change the modification time of the source to be older
> than the object files, so that on subsequent builds, make does not rebuild
> the whole world.
>
> This works fine except for one thing, after changing the mtime of the
> source files, the first subsequent git command needing to compare the tree
> with the index will take a LONG time since it will read all of the object
> content:
>
>   cd linux-2.6
>
>   # Less than a second  when the index is up to date
>   time git status > /dev/null
>   git status 0.06s user 0.09s system 172% cpu 0.087 total
>                                               ~~~~~~~~~~~
>
>   # Change the mtime..
>   git ls-tree -r --name-only HEAD | xargs -n 1024 touch
>
>   # Now 30s..
>   time git status > /dev/null
>   git status  2.73s user 1.79s system 13% cpu 32.453 total
>                                               ~~~~~~~~~~~~
>
> The timing information above was captured on my laptop SSD and the penalty
> is obviously much higher on spinning disks - especially when this operation
> is done on *hundreds* of different work tree in parallel, all hosted on the
> same filesystem (it can take tens of minutes!).
>
> Is there any way to tell git, after the git ls-tree command above, to
> refresh its stat cache information and trust us that the file content has
> not changed, as to avoid any useless file read (though it will obviously
> will have to stat all of them, but that's not something we can really
> avoid)

I don't think there's any way to do that, unfortunately.

> If not, I am willing to implement a --assume-content-unchanged to the git
> update-index if you guys don't see something fundamentally wrong with this
> approach.

If you do that, I think you should go with either of the following options

- Extend git-update-index --index-info to take stat info as well (or
maybe make a new option instead). Then you can feed stat info directly
to git without a use-case-specific "assume-content-unchanged".

- Add "git update-index --touch" that does what "touch" does. In this
case, it blindly updates stat info to latest. But like touch, we can
also specify  mtime from command line if we need to. It's a bit less
generic than the above option, but easier to use.

Caveat: The options I'm proposing can be rejected. So maybe wait a bit
to see how people feel and perhaps send an RFC patch, again to gauge
the reception.

-- 
Duy



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]