On Thu, Jul 28, 2016 at 6:02 PM, Robin Ruede <r.ruede@xxxxxxxxx> wrote: > This patch series adds a `--sparse-prefix=` option to multiple commands, > allowing fetching repository contents from only a subdirectory of a remote. > > This works along with sparse-checkout, and is especially useful for repositories > where a subdirectory has meaning when standing alone. Ah.. this is what I call narrow checkout [1] (but gmane is down at the moment) [1] http://thread.gmane.org/gmane.comp.version-control.git/155427 > * Motivation (example use cases) > > ... nods nods.. all good stuff > * Open problems: > > 1. Currently all trees are still included. It would be possible to > include only the trees relevant to the sparse files, which would significantly > reduce the pack sizes for repositories containing a lot of small files changing > often. For example package managers using git. Not sure in how many places all > trees are presumed present. You can limit some trees by passing a pathspec to "git rev-list" (in your "list-objects" patch). All trees completely outside sub/dir will be excluded. Trees leading to it (e.g. root tree and "sub") are still included. Not having all trees open up a new set of problems.. This is what I did in narrow clone: pass some directories (as pathspec) to rev-list on the server side, then deal with lack of trees on client side. > 2. This patch set implements it as a simple single prefix check command line > option. > Using the exclude_list format (same as in sparse-checkout) might be useful. > The server needs to check these patterns for all files in history, so I'm not > sure if allowing multiple/complex patterns is a good idea. I would go with something else than sparse-checkout, which I call narrow checkout: instead of flattening the entire tree in index and keep only files there, we keep trees that we don't have as trees. Those trees have the same "sparse checkout" attributes, e.g. ignore worktree and some of submodules e.g. don't bother checking the associated hash. This approach [2] eliminates changes in cache-tree.c (i.e. 3/7). And you would need something like that, when you don't have all the trees (from open problem 1), because you just can't flatten trees when you don't have them. [2] https://github.com/pclouds/git/commits/lanh/narrow-checkout (I think core functionality is in place, but narrow operation still needs more work) > 3. This patch set assumes the sparse-prefix and sparse-checkout does not change. > running clone and fetch both need to have the --sparse-prefix= option, otherwise > complete packs will be fetched. Not sure what the best way to store the > information is, possibly create a new file `.git/sparse` similar to > `.git/shallow` containing the path(s). Something like .git/shallow, yes. It's similar in nature anyway (shallow cuts depth, you cut the side) > 3. Bitmap indices cannot be used, because they do not contain the paths of the > objects. So for creating packs, the whole DAG has to be walked. And shallow clones have this same problem. Something to be sorted out :) > 4. Fsck complains about missing blobs. Should be fairly easy to fix. Not really. You'll have to associate path information with blobs before you decide that a blob should exist or not. Sparse patterns are just not designed for that (tree walking). If you narrow (heh) down to just path prefix not full blown sparse patterns, then it's feasible to walk tree and filter. A subset of pathspec would be good because we can already filter by pathspec, but I would not go full pathspec at the first step. > 5. Tests and documentation is missing. Personally I would go with my narrow clone approach, but the ability to selectively exclude some large blobs is still good, I think. However, another approach to excluding some blobs is the external object database [3]. It gives you what you need with a lot less code impact (but you will not be able to work offline 100% the time like what you can now with git) [3] https://public-inbox.org/git/20160613085546.11784-1-chriscool%40tuxfamily.org/ -- Duy -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html