On Tue, Jan 28, 2020 at 06:26:41PM +0000, Derrick Stolee via GitGitGadget wrote: > From: Derrick Stolee <dstolee@xxxxxxxxxxxxx> > > If a user somehow creates a directory with an asterisk (*) or backslash > (\), then the "git sparse-checkout set" command will struggle to provide > the correct pattern in the sparse-checkout file. When not in cone mode, > the provided pattern is written directly into the sparse-checkout file. > However, in cone mode we expect a list of paths to directories and then > we convert those into patterns. Is this really about cone mode? It seems more like it is about --stdin. I.e., what are the rules for when the input is a filename and when it is a pattern? In our earlier discussion, I had assumed that command-line arguments to "sparse-checkout set" were actual filenames, and "--stdin" just read them from stdin. But looking at the documentation, they are always called "patterns" on the command-line. Should the "--stdin" documentation make it clear that we are no longer taking patterns, but instead actual filenames? Or am I confused, and in non-cone-mode the "ls-tree | sparse-checkout" pipeline is not supposed to work at all? (I.e., they really are always patterns)? > Even more specifically, the goal is to always allow the following from > the root of a repo: > > git ls-tree --name-only -d HEAD | git sparse-checkout set --stdin > > The ls-tree command provides directory names with an unescaped asterisk. > It also quotes the directories that contain an escaped backslash. We > must remove these quotes, then keep the escaped backslashes. > > However, there is some care needed for the timing of these escapes. The > in-memory pattern list is used to update the working directory before > writing the patterns to disk. Thus, we need the command to have the > unescaped names in the hashsets for the cone comparisons, then escape > the patterns later. OK, this part make sense. > Use unquote_c_style() when parsing lines from stdin. Command-line > arguments will be parsed as-is, assuming the user can do the correct > level of escaping from their environment to match the exact directory > names. I think there's two issues here: escaping characters from the shell so that they make it intact to Git, and the question of whether Git is expecting patterns or raw filenames. I agree the user is responsible for the shell half, but I think we need to clarify what we're expecting. I.e., if I say: git sparse-checkout set 'f*' am I trying to match "foo", or the literal file "f*"? > +static char *escaped_pattern(char *pattern) > +{ > + char *p = pattern; > + struct strbuf final = STRBUF_INIT; > + > + while (*p) { > + if (*p == '*' || *p == '\\') > + strbuf_addch(&final, '\\'); > + > + strbuf_addch(&final, *p); > + p++; > + } > + > + return strbuf_detach(&final, NULL); > +} Do we need to catch other metacharacters here (using is_glob_special() perhaps)? > @@ -423,8 +442,21 @@ static int sparse_checkout_set(int argc, const char **argv, const char *prefix) > pl.use_cone_patterns = 1; > > if (set_opts.use_stdin) { > - while (!strbuf_getline(&line, stdin)) > + struct strbuf unquoted = STRBUF_INIT; > + while (!strbuf_getline(&line, stdin)) { > + if (line.buf[0] == '"') { > + strbuf_setlen(&unquoted, 0); A minor nit, but strbuf_reset(&unquoted) would be more idiomatic here. > + if (unquote_c_style(&unquoted, line.buf, NULL)) > + die(_("unable to unquote C-style string '%s'"), > + line.buf); > + > + strbuf_swap(&unquoted, &line); > + } > + > strbuf_to_cone_pattern(&line, &pl); > + } OK, overall this input procedure makes sense to me. -Peff