On 1/14/2020 5:48 PM, Jeff King wrote: > On Tue, Jan 14, 2020 at 05:11:03PM -0500, Derrick Stolee wrote: > >>> Do we need to document these rules somewhere? Naively I'd expect >>> "--stdin" to take in literal pathnames. But of course it can't represent >>> a path with a newline. So perhaps it makes sense to take quoted names by >>> default, and allow literal NUL-separated input with "-z" if anybody >>> wants it. >> >> This is worth thinking about the right way to describe the rules: >> >> 1. You don't _need_ quotes. They happen to come along for the ride in >> 'git ls-tree' so it doesn't mess up shell scripts that iterate on >> those entries. At least, that's why I think they are quoted. > > It's not just shell scripts. Without quoting, the syntax becomes > ambiguous (e.g., imagine a file with a newline in it). So most Git > output that shows a filename will quote it if necessary, unless > NUL separators are being used. Good to know. >> 2. If you use quotes, the first layer of quotes will be removed. > > I take this to mean that anything starting with a double-quote will have > the outer layer removed, and backslash escapes inside expanded. And > anything without a starting double quote (even if it has internal > backslash escapes!) will be taken literally. Hm. Perhaps you are right! The ls-tree output for the test example is: deep folder1 folder2 "zbad\\dir" zdoes*exist so the "zdoes*exist" value is not escaped. I believe the current logic does something extra: consider supplying this input to 'git sparse-checkout set --stdin': deep folder1 folder2 "zbad\\dir" zdoes\*exist then should we un-escape "\*" to "*"? Or is this not a valid input because a backslash should have been quoted into C-style quotes? The behavior in the current series allows this output that would never be written by "git ls-tree". > That would match how things like "update-index --index-info" work. > > As far as implementation, I know you're trying to keep some of the > escaping, but I think it might make more sense to do use > unquote_c_style() to parse the input (see update-index's use for some > prior art), and then re-quote as necessary to put things into the > sparse-checkout file (I guess quoting more than just quote_c_style() > would do, since you need to quote glob metacharacters like '*' and > probably "!"). But as much as possible, I think you'd want literal > strings inside the program, and just quoting/unquoting at the edges. I was playing around with this, and I think that quote_c_style() is necessary for the output, but we have a strange in-memory situation for the other escaping: we both fill the hashsets with the un-escaped data and fill the pattern list with the escaped patterns. I'll add a commit with the quote_c_style() calls during the 'list' subcommand. >> How much of this needs to be documented explicitly, or how much should >> we say "The input format matches what we would expect from 'git ls-tree >> --name-only'"? > > I think it's fine to say that, and maybe call attention to the quoting. > Like: > > The input format matches the output of `git ls-tree --name-only`. This > includes interpreting pathnames that begin with a double quote (") as > C-style quoted strings. > > Disappointingly, update-index does not seem to explain the rules > anywhere. fast-import does cover it. Maybe it's something that ought to > be hoisted out into gitcli(7) or similar (or maybe it has been and I > just can't find it). I'll start the process by using your recommended language. I noticed also that the 'set' command doesn't actually document what happens when in cone mode, so I will correct that, too. Thanks, -Stolee