On Fri, Jan 24, 2020 at 04:10:21PM -0500, Derrick Stolee wrote: > Hm. Perhaps you are right! The ls-tree output for the test example > is: > > deep > folder1 > folder2 > "zbad\\dir" > zdoes*exist > > so the "zdoes*exist" value is not escaped. I believe the current > logic does something extra: consider supplying this input to > 'git sparse-checkout set --stdin': > > deep > folder1 > folder2 > "zbad\\dir" > zdoes\*exist > > then should we un-escape "\*" to "*"? Or is this not a valid input > because a backslash should have been quoted into C-style quotes? I'd think we should not un-escape anything, because we weren't told that this was a C-style quoted string by the presence of an opening double-quote. And that's how, say, update-index behaves: $ blob=$(echo foo | git hash-object -w --stdin) $ printf '100644 %s\t%s\n' \ $blob 'just*asterisk' \ $blob 'backslash\without\quotes' \ $blob '"backslash\\with\\quotes"' | git update-index --index-info which results in: $ git ls-files "backslash\\with\\quotes" "backslash\\without\\quotes" just*asterisk [same, but without quoting] $ git ls-files -z | tr '\0' '\n' backslash\with\quotes backslash\without\quotes just*asterisk > The behavior in the current series allows this output that would > never be written by "git ls-tree". Yes, I think we'd never write that, because ls-tree would quote anything with a backslash in it, even though it's not strictly necessary. But it would be valid input to specify a file that has backslashes but not double-quotes, and I think sparse-checkout should be changed to match update-index here. > I was playing around with this, and I think that quote_c_style() is > necessary for the output, but we have a strange in-memory situation > for the other escaping: we both fill the hashsets with the un-escaped > data and fill the pattern list with the escaped patterns. Yeah, but I think that the syntactic escaping on input might not have identical rules to the escaping needed for the patterns. So it makes sense to me to handle input as a separate mechanism, get a pristine copy of what the user was trying to communicate to us, and then re-escape whatever we need to put into the pattern list. And ultimately the flow would be something like: - read input - if argument is from command-line, use it verbatim - else if reading stdin with "-z", use it verbatim - else if line starts with double-quote, unquote_c_style() - else use line verbatim - the result is a single pristine filename - fill hashset with pristine filenames - generate pattern list to write to sparse file, escaping filenames as necessary according to sparse-pattern rules Obviously you don't have a "-z" yet, but I think it's something we'd probably want in the long run. And anything coming from the command-line shouldn't need quoting to get it to us either (and so we'd need to escape before writing to the sparse file). -Peff