Re: [RFC v1] Add virtual file system settings and hook proc

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



not really a review, just  a couple quick notes..

On Tue, Oct 30, 2018 at 9:40 PM Ben Peart <peartben@xxxxxxxxx> wrote:
>
> From: Ben Peart <benpeart@xxxxxxxxxxxxx>
>
> On index load, clear/set the skip worktree bits based on the virtual
> file system data. Use virtual file system data to update skip-worktree
> bit in unpack-trees. Use virtual file system data to exclude files and
> folders not explicitly requested.
>
> Signed-off-by: Ben Peart <benpeart@xxxxxxxxxxxxx>
> ---
>
> We have taken several steps to make git perform well on very large repos.
> Some of those steps include: improving underlying algorithms, utilizing
> multi-threading where possible, and simplifying the behavior of some commands.
> These changes typically benefit all git repos to varying degrees.  While
> these optimizations all help, they are insufficient to provide adequate
> performance on the very large repos we often work with.
>
> To make git perform well on the very largest repos, we had to make more
> significant changes.  The biggest performance win by far is the work we have
> done to make git operations O(modified) instead of O(size of repo).  This
> takes advantage of the fact that the number of files a developer has modified
> is a tiny fraction of the overall repo size.
>
> We accomplished this by utilizing the existing internal logic for the skip
> worktree bit and excludes to tell git to ignore all files and folders other
> than those that have been modified.  This logic is driven by an external
> process that monitors writes to the repo and communicates the list of files
> and folders with changes to git via the virtual file system hook in this patch.
>
> The external process maintains a list of files and folders that have been
> modified.  When git runs, it requests the list of files and folders that
> have been modified via the virtual file system hook.  Git then sets/clears
> the skip-worktree bit on the cache entries and builds a hashmap of the
> modified files/folders that is used by the excludes logic to avoid scanning
> the entire repo looking for changes and untracked files.
>
> With this system, we have been able to make local git command performance on
> extremely large repos (millions of files, 1/2 million folders) entirely
> manageable (30 second checkout, 3.5 seconds status, 4 second add, 7 second
> commit, etc).
>
> Our desire is to eliminate all custom patches in our fork of git.  To that
> end, I'm submitting this as an RFC to see how much interest there is and how
> much willingness to take this type of change into git.

Most of these paragraphs (perhaps except the last one) should be part
of the commit message. You describe briefly what the patch does but
it's even more important to say why you want to do it.

> +core.virtualFilesystem::
> +       If set, the value of this variable is used as a command which
> +       will identify all files and directories that are present in
> +       the working directory.  Git will only track and update files
> +       listed in the virtual file system.  Using the virtual file system
> +       will supersede the sparse-checkout settings which will be ignored.
> +       See the "virtual file system" section of linkgit:githooks[6].

It sounds like "virtual file system" is just one of the use cases for
this feature, which is more about a dynamic source of sparse-checkout
bits. Perhaps name the config key with something along sparse checkout
instead of naming it after a use case.

This is a hook. I notice we start to avoid adding real hooks and just
add config keys instead. Eventually we should have config-based hooks,
but if we're going to add more like this, I think these should be in a
separate section, hook.virtualFileSystem or something.

I don't think the superseding makes sense. There's no reason this
could not be used in combination with $GIT_DIR/info/sparse-checkout.
If you don't want both, disable the other.

One last note. Since this is related to filesystem. Shouldn't it be
part of fsmonitor (the protocol, not the implementation)? Then
watchman user could use it to.
-- 
Duy



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux