On Fri, May 8, 2020 at 8:42 AM Derrick Stolee <stolee@xxxxxxxxx> wrote: > > On 5/7/2020 6:58 PM, Junio C Hamano wrote: > > "Derrick Stolee via GitGitGadget" <gitgitgadget@xxxxxxxxx> writes: > > > >> One of the difficulties of using the sparse-checkout feature is not > >> knowing which directories are absolutely needed for working in a portion > >> of the repository. Some of this can be documented in README files or > >> included in a bootstrapping tool along with the repository. This is done > >> in an ad-hoc way by every project that wants to use it. > >> > >> Let's make this process easier for users by creating a way to define a > >> useful sparse-checkout definition inside the Git tree data. This has > >> several benefits. In particular, the data is available to anyone who has > >> a copy of the repository without needing a different data source. > >> Second, the needs of the repository can change over time and Git can > >> present a way to automatically update the working directory as these > >> sparse-checkout definitions change over time. > > > > And two lines of development can merge them together? > > > > Any time a new "feature" pops up that would eventually affect how > > "git clone" and "git checkout" work based on untrusted user data, we > > need to make sure there is no negative security implications. > > > > If it only boils down to "we have files that can record list of > > leading directory names and without offering extra 'flexibility'", I > > guess there aren't all that much that a malicious sparse definition > > can do and we would be safe, though. > > Yes. I hope that we can be extremely careful with this feature. > The RFC status of this series implicitly includes the question > "Should we do this at all?" I think the benefits outweigh the > risks, but we can minimize those risks with very careful design > and implementation. > > >> To use this feature, add the "--in-tree" option when setting or adding > >> directories to the sparse-checkout definition. For example: > >> > >> $ git sparse-checkout set --in-tree .sparse/base > >> $ git sparse-checkout add --in-tree .sparse/extra > >> > >> These commands add values to the multi-valued config setting > >> "sparse.inTree". When updating the sparse-checkout definition, these > >> values describe paths in the repository to find the sparse-checkout > >> data. After the commands listed earlier, we expect to see the following > >> in .git/config.worktree: > >> > >> [sparse] > >> intree = .sparse/base > >> intree = .sparse/extra > > > > What does this say in human words? "These two tracked files specify > > which paths should be in the working tree"? Spelling it out here > > would help readers of this commit. > > You got it. Sounds good. > > >> When applying the sparse-checkout definitions from this config, the > >> blobs at HEAD:.sparse/base and HEAD:.sparse/extra are loaded. > > > > OK, so end-user edit to the working tree copy or what is added to > > the index does not count and only the committed version gets used. > > > > That makes it simple---I was wondering how we would operate when > > merging a branch with different contents in the .sparse/* files > > until the conflicts are resolved. > > It's worth testing this case so we can be sure what happens. During a merge or rebase or checkout -m, what happens if .sparse/extra has the following working tree content: [sparse] dir = D dir = X <<<<<< HEAD dir = Y |||||| MERGE_BASE ====== inherit = .sparse/tools >>>>>> MERGE_HEAD inherit = .sparse/base and, of course, three different entries in the index? Also, do we use the version of the --in-tree file from the latest commit, from the index, or from the working tree? (This is a question not only for merge and rebase, but also checkout with dirty changes and even checkout -m.) Which one "wins"? And what if the user updates and commits an ill-formed version of the file -- is it equivalent to getting an empty cone with just the toplevel directory, equivalent to getting a complete checkout of everything, or something else?