On 1/29/2020 5:33 AM, Jeff King wrote: > On Wed, Jan 29, 2020 at 05:17:13AM -0500, Jeff King wrote: > >>> From: Derrick Stolee <dstolee@xxxxxxxxxxxxx> >>> >>> If a user somehow creates a directory with an asterisk (*) or backslash >>> (\), then the "git sparse-checkout set" command will struggle to provide >>> the correct pattern in the sparse-checkout file. When not in cone mode, >>> the provided pattern is written directly into the sparse-checkout file. >>> However, in cone mode we expect a list of paths to directories and then >>> we convert those into patterns. >> >> Is this really about cone mode? It seems more like it is about --stdin. >> I.e., what are the rules for when the input is a filename and when it is >> a pattern? In our earlier discussion, I had assumed that command-line >> arguments to "sparse-checkout set" were actual filenames, and "--stdin" >> just read them from stdin. >> >> But looking at the documentation, they are always called "patterns" on >> the command-line. Should the "--stdin" documentation make it clear that >> we are no longer taking patterns, but instead actual filenames? >> >> Or am I confused, and in non-cone-mode the "ls-tree | sparse-checkout" >> pipeline is not supposed to work at all? (I.e., they really are always >> patterns)? > > Hmph, sorry, I _was_ just confused. I was reading a copy of the manpage > without your final patch, which made things much clearer. > > So OK, I think the resulting documentation does make things clear. And > this is just about cone mode, not --stdin. Please ignore my ramblings in > the rest of the replied-to message. But... > >>> Even more specifically, the goal is to always allow the following from >>> the root of a repo: >>> >>> git ls-tree --name-only -d HEAD | git sparse-checkout set --stdin >>> >>> The ls-tree command provides directory names with an unescaped asterisk. >>> It also quotes the directories that contain an escaped backslash. We >>> must remove these quotes, then keep the escaped backslashes. >>> >>> However, there is some care needed for the timing of these escapes. The >>> in-memory pattern list is used to update the working directory before >>> writing the patterns to disk. Thus, we need the command to have the >>> unescaped names in the hashsets for the cone comparisons, then escape >>> the patterns later. >> >> OK, this part make sense. > > You could also demonstrate this even without --stdin with something > like: > > git config core.sparsecheckoutcone true > git sparse-checkout set 'foo*bar' > > which should take that as a literal filename and put the pattern > 'foo\*bar' in the sparse-checkout file. And your tests do cover that. > > So really there are two separate bugs here, and it might be a little > easier to explain the "timing of these escapes" thing by doing them > separately. I.e., the case above needs escaping and we could demonstrate > the bug with a command-line "set". And then follow up by fixing the > problem with correctly de-quoting --stdin. I've locally split the commit into two parts. That makes things much simpler to read. >>> +static char *escaped_pattern(char *pattern) >> [...] >> Do we need to catch other metacharacters here (using is_glob_special() >> perhaps)? > > After de-confusing myself, I think the individual code comments I wrote > still apply though (especially this one). I've applied the smaller comments and am now investigating the right thing to do with other is_glob_special() characters. There is a small chance that I can replace any "c == '*' || c == '\'" with is_glob_special(), but we shall see. At the very least, I'll need to expand my tests. Thanks, -Stolee