RE: Uninitialized submodules as symlinks

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




> -----Original Message-----
> From: Stefan Beller [mailto:sbeller@xxxxxxxxxx]
> Sent: Friday, October 07, 2016 2:56 PM
> To: David Turner
> Cc: git@xxxxxxxxxxxxxxx
> Subject: Re: Uninitialized submodules as symlinks
> 
> On Fri, Oct 7, 2016 at 11:17 AM, David Turner <David.Turner@xxxxxxxxxxxx>
> wrote:
> > Presently, uninitialized submodules are materialized in the working tree
> > as empty directories.
> 
> Right, there has to be something, to hint at the user that creating a file
> with that path is probably not what they want.
> 
> >  We would like to consider having them be symlinks.  Specifically, we'd
> > like them to be symlinks into a FUSE filesystem which retrieves files on
> > demand.
> >
> > We've actually already got a FUSE filesystem written, but we use a
> > different (semi-manual) means to connect it to the initialized submodules.
> 
> So you currently do a
> 
>     git submodule init <pathspec>
>     custom-submodule make-symlink <pathspec>
> 
> ?

We do something like

For each initialized submodule: symlink it into the right place in .../somedir
For each uninitialized submodule: symlink from the FUSE into the right place in .../somedir

So .../somedir has the structure of the git main repo, but is all symlinks -- some into FUSE, some into the git repo.

This means that when we initialize (or deinitialize) a submodule, we need to re-run the linking script.  

> > We hope to release this FUSE filesystem as free software at some point
> > soon, but we do not yet have a fixed schedule for doing so.  Having to run
> > a command to create the symlink-based "union" filesystem is not optimal
> > (since we have to re-run it every time we initialize or deinitialize a
> > submodule).
> >
> > But if the uninitialized submodules could be symlinks into the FUSE
> > filesystem, we wouldn't have this problem.  This solution isn't
> > necessarily FUSE-specific -- perhaps someone would want copies of the same
> > submodule in multiple repos, and would want to save disk space by having
> > all copies point to the same place.  So the symlinks would be configured
> > by a per-submodule config variable.
> 
> I'd imagine that you want both a per-submodule config variable as well as
> a global variable that is a default for all submodules?
> 
>     git config submodule.trySymlinkDefault /mounted/fuse/
>     # any (new) submodule tries to be linked to /mounted/fuse/<path>
>     git config submodule.<name>.symlinked ~/my/private/symlinked
>     # The <name> submodule goes into another path.
> 
> As you propose the FUSE filesystem fetches files on demand, you probably
> want to disable things that scan the whole submodule, e.g. look at
> submodule.<name>.ignore to suppress status looking at all files.

I would actually expect that git would detect that the symlink is unmodified from the configured symlink and automatically decide not to look there.
 
> When looking through the options, you could add the value "symlink" to
> submodule.<name>.update, which then respects the
> submodule.trySymlinkDefault if present, such that
> 
>     git clone --recurse-submodules ...
> 
> works and sets up the FUSE thing correctly.
> 
> How does the FUSE system handle different versions, i.e.
> `git submodule update` to checkout another version of the submodule?
> (btw, I plan on working on integrating submodules to "git checkout", so
> "submodule update" would not need to be run there, but we'd hook it into
> checkout instead)

The fuse has a (virtual) directory for each SHA of the main repo, with each submodule mapped to the then-current version of the submodule's code. Actually, it's a bit more complicated because the uninitialized modules point to already-built binaries -- that is, the symlink is to something like $fuse/$SHA/built/$submodule. 

If you check out a new version of the main module, in our current setup, you need to again update all of the submodule symlinks (as described above). 

Under my proposal, I guess this would still need to happen.  A post-checkout hook could handle it either way.  Despite this flaw, switching a submodule between an initialized and deinitialized state would still be more seamless with the symlinks.

> > Naturally, this would require some changes to code that examines the
> working tree -- git status, git diff, etc.  They would have to report
> "unchanged" for submodules which were still symlinks to the configured
> location.  I have not yet looked at the implementation details beyond
> this.
> >
> > Does this idea make any sense?  If I were to implement it (probably in a
> few months, but no official timeline yet), would patches be considered?
> 
> I am happy to review patches.

Thanks.




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]