Re: [PATCH 04/10] sparse-checkout: allow in-tree definitions

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 5/7/2020 6:58 PM, Junio C Hamano wrote:
> "Derrick Stolee via GitGitGadget" <gitgitgadget@xxxxxxxxx> writes:
> 
>> One of the difficulties of using the sparse-checkout feature is not
>> knowing which directories are absolutely needed for working in a portion
>> of the repository. Some of this can be documented in README files or
>> included in a bootstrapping tool along with the repository. This is done
>> in an ad-hoc way by every project that wants to use it.
>>
>> Let's make this process easier for users by creating a way to define a
>> useful sparse-checkout definition inside the Git tree data. This has
>> several benefits. In particular, the data is available to anyone who has
>> a copy of the repository without needing a different data source.
>> Second, the needs of the repository can change over time and Git can
>> present a way to automatically update the working directory as these
>> sparse-checkout definitions change over time.
> 
> And two lines of development can merge them together?
> 
> Any time a new "feature" pops up that would eventually affect how
> "git clone" and "git checkout" work based on untrusted user data, we
> need to make sure there is no negative security implications.  
> 
> If it only boils down to "we have files that can record list of
> leading directory names and without offering extra 'flexibility'", I
> guess there aren't all that much that a malicious sparse definition
> can do and we would be safe, though.

Yes. I hope that we can be extremely careful with this feature.
The RFC status of this series implicitly includes the question
"Should we do this at all?" I think the benefits outweigh the
risks, but we can minimize those risks with very careful design
and implementation.

>> To use this feature, add the "--in-tree" option when setting or adding
>> directories to the sparse-checkout definition. For example:
>>
>>   $ git sparse-checkout set --in-tree .sparse/base
>>   $ git sparse-checkout add --in-tree .sparse/extra
>>
>> These commands add values to the multi-valued config setting
>> "sparse.inTree". When updating the sparse-checkout definition, these
>> values describe paths in the repository to find the sparse-checkout
>> data. After the commands listed earlier, we expect to see the following
>> in .git/config.worktree:
>>
>> 	[sparse]
>> 		intree = .sparse/base
>> 		intree = .sparse/extra
> 
> What does this say in human words?  "These two tracked files specify
> which paths should be in the working tree"?  Spelling it out here
> would help readers of this commit.

You got it. Sounds good.

>> When applying the sparse-checkout definitions from this config, the
>> blobs at HEAD:.sparse/base and HEAD:.sparse/extra are loaded. 
> 
> OK, so end-user edit to the working tree copy or what is added to
> the index does not count and only the committed version gets used.
> 
> That makes it simple---I was wondering how we would operate when
> merging a branch with different contents in the .sparse/* files
> until the conflicts are resolved.

It's worth testing this case so we can be sure what happens.

>> In those
>> files, the multi-valued config values "sparse.dir" are considered as
>> the directories to construct a cone mode sparse-checkout file. The end
>> result is as if these paths were provided to "git sparse-checkout set"
>> in cone mode.
> 
> OK.
> 
>> For example, suppose .sparse/base had the following content:
>>
>> 	[sparse]
>> 		dir = A
>> 		dir = B/C
>> 		dir = D/E/F
>>
>> and .sparse/extra had the following content:
>>
>> 	[sparse]
>> 		dir = D
>> 		dir = X
>>
>> Then, the output of "git sparse-checkout list" would be
>>
>> 	A
>> 	B/C
>> 	D
>> 	X
>>
>> Note that since "D" contains "D/E/F", that directory replaces the
>> position of "D/E/F" in the list.
>>
>> Since these are parsed using the config library, the parser is robust
>> enough to understand comments and complicated string values.
>>
>> The key benefit to this approach is that it can be extended by defining
>> new config values. In a later change, we will introduce "sparse.inherit"
>> to point to another file in the tree. This will solve the problem of
>> editing many files when core dependencies change.
> 
> With only a multi-valued sparse.dir elements allowed in these
> in-tree .sparse/* files, I guess there isn't much mischeaf a
> malicious .sparse/* file can do.  Can it try to [includeIf] some
> paths external to the repository to cause a remote attacker to learn
> about the paths on the local system, perhaps? 

I was unaware of includes in the config format [1]. While the behavior
should be safe because we are only pulling very specific data from the
config, it would be best to see if we can disable includes when reading
config from a blob.

[1] https://git-scm.com/docs/git-config#_includes

> 
>> diff --git a/Documentation/config/sparse.txt b/Documentation/config/sparse.txt
>> new file mode 100644
>> index 00000000000..c1fce87cd33
>> --- /dev/null
>> +++ b/Documentation/config/sparse.txt
>> @@ -0,0 +1,15 @@
>> +sparse.inTree::
>> +	The `core.sparseCheckout` config option enables the `sparse-checkout`
>> +	feature, but if there are any values for the multi-valued
>> +	`sparse.inTree` config option, then the sparse-checkout patterns are
>> +	defined by parsing the files listed in these values. See
>> +	linkgit:git-sparse-checkout[1] for more information.
> 
> Does "... but if ... then" mean "by parsing the files and ignoring
> all other things that may otherwise define patterns"?
> 
>> +sparse.dir::
>> +	This config setting is ignored if present in the repository config.
>> +	Instead, this multi-valued option is present in the files listed by
>> +	`sparse.inTree` and specifies the directories needed in the
>> +	working directory. The union of all `sparse.dir` values across all
>> +	`sparse.inTree` files forms the input for `git sparse-checkout set`
>> +	in cone mode.  See linkgit:git-sparse-checkout[1] for more
>> +	information.
> 
> If this is *not* a config in the usual sense, we probably should not
> include it in this document and in the "git config --help" output.
> That will allow us to drop the first sentence.
> 
> Those .sparse/* in-tree files are like .gitmodules in the sense that
> they happen to use the same syntax so that the parser can be shared,
> but they are not allowed to affect end-user configuration
> (e.g. writing "[diff] external=rm -fr ." in the file has no effect)
> at all, right?
> 
> And we should describe what we can write in these in-tree files
> separately and make it clear that they are _different_ from the
> configuration variables.

I can move the "keys" from the config documentation and into the
git-sparse-checkout.txt file.

Thanks,
-Stolee



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux