Re: [PATCH 8/8] sparse-checkout: write escaped patterns in cone mode

Derrick Stolee <stolee@xxxxxxxxx> · Tue, 14 Jan 2020 17:11:03 -0500

On 1/14/2020 4:25 PM, Jeff King wrote:
> On Tue, Jan 14, 2020 at 07:26:02PM +0000, Derrick Stolee via GitGitGadget wrote:
> 
>> From: Derrick Stolee <dstolee@xxxxxxxxxxxxx>
>>
>> If a user somehow creates a directory with an asterisk (*) or backslash
>> (\), then the "git sparse-checkout set" command will struggle to provide
>> the correct pattern in the sparse-checkout file. When not in cone mode,
>> the provided pattern is written directly into the sparse-checkout file.
>> However, in cone mode we expect a list of paths to directories and then
>> we convert those into patterns.
>>
>> Even more specifically, the goal is to always allow the following from
>> the root of a repo:
>>
>>   git ls-tree --name-only -d HEAD | git sparse-checkout set --stdin
>>
>> The ls-tree command provides directory names with an unescaped asterisk.
>> It also quotes the directories that contain an escaped backslash. We
>> must remove these quotes, then keep the escaped backslashes.
> 
> Do we need to document these rules somewhere? Naively I'd expect
> "--stdin" to take in literal pathnames. But of course it can't represent
> a path with a newline. So perhaps it makes sense to take quoted names by
> default, and allow literal NUL-separated input with "-z" if anybody
> wants it.

This is worth thinking about the right way to describe the rules:

1. You don't _need_ quotes. They happen to come along for the ride in
  'git ls-tree' so it doesn't mess up shell scripts that iterate on
  those entries. At least, that's why I think they are quoted.

2. If you use quotes, the first layer of quotes will be removed.

How much of this needs to be documented explicitly, or how much should
we say "The input format matches what we would expect from 'git ls-tree
--name-only'"?

Thanks,
-Stolee