Dear all, I'm currently missing a feature in git to be able to clone a subfolder as the root or phrased differently use a subfolder as a worktree. This mail has become rather long, so here is an overview: First, I state the problem and give some use cases. Then, I list a couple of partial workarounds and other methods for the problem, which are currently achievable with git. Lastly, I provide an idea how this feature could be implemented within git object model. The Problem ----------- Assume you have a git repository A with the following file structure: foo/a.txt bar/b.txt bar/baz/c.txt What I want to achieve is creating a git repo where bar/ equals the root /, i.e. a repo B with the contents b.txt baz/c.txt We can describe that as a lens or zoom of the repo A. I believe svn had that capability but I'm not sure. Applications for that feature ----------------------------- Notation: I use "repo:path" to indicate that path should be seen relative to the repository repo. * I have a repo which mimics (parts of) my entire file system (think configuration files). I'd like to be able to check out the subfolder repo:/etc/foobar in the actual filesystem folder /etc. While not checking out the rest of repo:/etc as that would lead in a disaster. * I have a website project where the html files (which are not generated by a build script) are in repo:www/ and I'd like to check them out to /srv/www/ for deployment. * Think of a big project with different components possibly stacked deep in a directory structure. We want to work on a single component somewhere down that structure, e.g. repo:client/new/x11/gtk/daemon/test/testapp. We could use git-sparse-checkout for that but that would leave us with a lot of quasi-empty directories client/new/x11/gtk/daemon/test/ where in this case only the testapp directory is relevant. Things/Workarounds I am aware of -------------------------------- 1. git submodules You probably think use git-submodules. However, bar/ is not a dependency or library where that would make sense but rather a part of of repo A. So the logical dependency is reversed: it's not A that depends on B but rather B depends on A. In some use cases changes in bar/ require additional changes in foo (in that use cases B is like a read-only view). 2. git clone + filter-branch We can clone the repo followed by a git-filter-branch (or its alternative git-filter-repo) git clone /path/to/repo/A /path/to/repo/B cd /path/to/repo/B git filter-branch --subdirectory-filter bar -- --all This creates a sort of read-only clone. But has massive drawbacks: * We cannot do a simple git pull to update repo B to the new state of repo A. To do that we have to clone and filer-branch it again. * It changes commit-IDs. * We cannot push changes done in B back to A. 3. git-subtree We can use git-subtree to filter the subdirectory and then clone the generated branch as repo B, like so: # in repo A cd /path/to/repo/A git subtree split --prefix=bar --annotate 'bar: ' --branch branch-bar git clone -b branch-bar /path/to/repo/A /path/to/repo/B Here we have: * It requires support from repo A which must generate the branch-bar. * Repo A now must contain two commit-histories (the main branch and the branch-bar) of the same logical-history. In particular the commit ids are different for the same logical commit in the main branch and the branch-bar. * branch-bar must be regenerated every time. I have not (yet) investigated whether git-subtree is capable of continuing a split from the last commit. So far I only managed that it recreates all commits in branch-bar (but with the same commit-ids as before) * Because the commit-ids of branch-bar do not change (at least when called with the same arguments), we can use git pull to update repo B * We can push changes in repo B back to repo A in the branch-bar. But I currently do not see a simple method how to incorporate this changes into the main branch. 4. git-sparse-checkout We can use git-sparse-checkout like so git clone /path/to/repo/A --no-checkout /path/to/repo/B cd /path/to/repo/B git sparse-checkout init git spares-checkout set bar This is kind a close to what I want in the sense that we can push and pull and the commit-ids are unaltered. However, this totally gets the directory structure wrong, which is a no-go in some of the above use cases. An idea for a solution ---------------------- The following is an idea how the above feature could be implemented. This is just a rough sketch and I have not thought how this approach would interact with other git tooling. We add a (for example; names are up to debate) --subfolder argument to git clone: git clone --subfolder bar /path/to/repo/A /path/to/repo/B This clones the complete repo A but checks out contents of path bar in the root directory. The HEAD points to the (full) commit. Additionally somewhere(tm) we store that we have zoomed in to only see paths in bar (maybe git-worktree can be expanded for that?). That is stuff under bar is treated like a checkedout repo while all other stuff is treated like being in a bare repo. (This at least should be the guide line when thinking about the behaviour git should provide) Doing a git push, git pull does the normal update of the repo but when checking out files to the working directory only those files under bar/ are considered. When editing and committing files in repo B, the following would be a sane thing to do: The (old) tree of the current HEAD is taken and then the subtree corresponding to bar is replaced with the tree in the index. That way we generate a full valid commit which can be pushed back to repo A. If we switch/checkout to a branch/commit that has not bar/ directory, then the checkout copy should be empty. If we add something and commit it, then the parent tree-objects of the new commit should be altered to contain the path bar. As git does not track directories this should work out as expected. Merge conflicts happen. If these happen for files inside the bar directory, the we can do our usual stuff. Due to the flexibility of git we can arrange that the commits/trees to be merged have a conflict outside of the bar directory. In that case we cannot produce a working copy of the commit. Thus, it seems appropriate to abort whatever we do and inform the user to use a full clone for doing the merge. When cloning repo B to repo C, there are not restrictions whatsoever as B has a full copy of the repository (just not checked out). So when looking from "outside" repo A and repo B are indistinguishable. Thus the following works: git clone --subfolder bar /path/to/repo/A /path/to/repo/B git clone /path/to/repo/B /path/to/repo/C Repo A and C are the same (both without a zoom). git clone --subfolder bar /path/to/repo/A /path/to/repo/B git clone --subfolder foo /path/to/repo/B /path/to/repo/C1 git clone --subfolder foo /path/to/repo/A /path/to/repo/C2 Repo C1 and C2 are the same (both zoomed to foo/). Non-goals: The following (weird?) thing is outside of the scope of this idea. zooming in into two (or more) directories simultaneously, e.g. repo A: foo1/bar/... foo2/foo3/baz/... and with the hypothetical git clone --subfolder foo1 --subfolder foo2/foo3 we get repo B: bar/... baz/... Also converting a zoomed repo B into a full (non-bare) repo A is not part of it. Although I think, this could easily be achieved by some usage of deleting the reference to the subfolder and doing a `git reset --hard` on the working copy. Summary ------- That the problem is not about the size of the checkout (which sparse-checkout tackles), nor the size of the repo or the mount of data which needs to be downloaded (both of which clone --depth tackles), its about getting the directory structure in repo B right while also keeping a strong link to repo A as upstream to pull (and maybe push) changes. If I have missed any approach for a solution I'd like to hear about it. Best Mickey