Re: [PATCH 8/8] sparse-checkout: write escaped patterns in cone mode

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Jan 24, 2020 at 04:10:21PM -0500, Derrick Stolee wrote:

> Hm. Perhaps you are right! The ls-tree output for the test example
> is:
> 
> 	deep
> 	folder1
> 	folder2
> 	"zbad\\dir"
> 	zdoes*exist
> 
> so the "zdoes*exist" value is not escaped. I believe the current
> logic does something extra: consider supplying this input to
> 'git sparse-checkout set --stdin':
> 
> 	deep
> 	folder1
> 	folder2
> 	"zbad\\dir"
> 	zdoes\*exist
> 
> then should we un-escape "\*" to "*"? Or is this not a valid input
> because a backslash should have been quoted into C-style quotes?

I'd think we should not un-escape anything, because we weren't told that
this was a C-style quoted string by the presence of an opening
double-quote. And that's how, say, update-index behaves:

  $ blob=$(echo foo | git hash-object -w --stdin)
  $ printf '100644 %s\t%s\n' \
      $blob 'just*asterisk' \
      $blob 'backslash\without\quotes' \
      $blob '"backslash\\with\\quotes"' |
    git update-index --index-info

which results in:

  $ git ls-files
  "backslash\\with\\quotes"
  "backslash\\without\\quotes"
  just*asterisk

  [same, but without quoting]
  $ git ls-files -z | tr '\0' '\n'
  backslash\with\quotes
  backslash\without\quotes
  just*asterisk

> The behavior in the current series allows this output that would
> never be written by "git ls-tree".

Yes, I think we'd never write that, because ls-tree would quote anything
with a backslash in it, even though it's not strictly necessary. But it
would be valid input to specify a file that has backslashes but not
double-quotes, and I think sparse-checkout should be changed to match
update-index here.

> I was playing around with this, and I think that quote_c_style() is
> necessary for the output, but we have a strange in-memory situation
> for the other escaping: we both fill the hashsets with the un-escaped
> data and fill the pattern list with the escaped patterns.

Yeah, but I think that the syntactic escaping on input might not have
identical rules to the escaping needed for the patterns.

So it makes sense to me to handle input as a separate mechanism, get a
pristine copy of what the user was trying to communicate to us, and then
re-escape whatever we need to put into the pattern list.

And ultimately the flow would be something like:

  - read input
    - if argument is from command-line, use it verbatim
    - else if reading stdin with "-z", use it verbatim
    - else if line starts with double-quote, unquote_c_style()
    - else use line verbatim
    - the result is a single pristine filename
  - fill hashset with pristine filenames
  - generate pattern list to write to sparse file, escaping filenames as
    necessary according to sparse-pattern rules

Obviously you don't have a "-z" yet, but I think it's something we'd
probably want in the long run. And anything coming from the command-line
shouldn't need quoting to get it to us either (and so we'd need to
escape before writing to the sparse file).

-Peff



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux