On Tue, Jan 14, 2020 at 05:11:03PM -0500, Derrick Stolee wrote: > > Do we need to document these rules somewhere? Naively I'd expect > > "--stdin" to take in literal pathnames. But of course it can't represent > > a path with a newline. So perhaps it makes sense to take quoted names by > > default, and allow literal NUL-separated input with "-z" if anybody > > wants it. > > This is worth thinking about the right way to describe the rules: > > 1. You don't _need_ quotes. They happen to come along for the ride in > 'git ls-tree' so it doesn't mess up shell scripts that iterate on > those entries. At least, that's why I think they are quoted. It's not just shell scripts. Without quoting, the syntax becomes ambiguous (e.g., imagine a file with a newline in it). So most Git output that shows a filename will quote it if necessary, unless NUL separators are being used. > 2. If you use quotes, the first layer of quotes will be removed. I take this to mean that anything starting with a double-quote will have the outer layer removed, and backslash escapes inside expanded. And anything without a starting double quote (even if it has internal backslash escapes!) will be taken literally. That would match how things like "update-index --index-info" work. As far as implementation, I know you're trying to keep some of the escaping, but I think it might make more sense to do use unquote_c_style() to parse the input (see update-index's use for some prior art), and then re-quote as necessary to put things into the sparse-checkout file (I guess quoting more than just quote_c_style() would do, since you need to quote glob metacharacters like '*' and probably "!"). But as much as possible, I think you'd want literal strings inside the program, and just quoting/unquoting at the edges. > How much of this needs to be documented explicitly, or how much should > we say "The input format matches what we would expect from 'git ls-tree > --name-only'"? I think it's fine to say that, and maybe call attention to the quoting. Like: The input format matches the output of `git ls-tree --name-only`. This includes interpreting pathnames that begin with a double quote (") as C-style quoted strings. Disappointingly, update-index does not seem to explain the rules anywhere. fast-import does cover it. Maybe it's something that ought to be hoisted out into gitcli(7) or similar (or maybe it has been and I just can't find it). -Peff