On 1/29/2020 5:03 AM, Jeff King wrote: > On Tue, Jan 28, 2020 at 06:26:40PM +0000, Derrick Stolee via GitGitGadget wrote: > >> From: Derrick Stolee <dstolee@xxxxxxxxxxxxx> >> >> In cone mode, the sparse-checkout feature uses hashset containment >> queries to match paths. Make this algorithm respect escaped asterisk >> (*) and backslash (\) characters. > > Do we also need to worry about other glob metacharacters? E.g., "?" or > ranges like "[A-Z]"? These are not part of the .gitignore patterns [1]. [1] https://git-scm.com/docs/gitignore#_pattern_format >> +static char *dup_and_filter_pattern(const char *pattern) >> +{ >> + char *set, *read; >> + char *result = xstrdup(pattern); >> + >> + set = result; >> + read = result; >> + >> + while (*read) { >> + /* skip escape characters (once) */ >> + if (*read == '\\') >> + read++; >> + >> + *set = *read; >> + >> + set++; >> + read++; >> + } >> + *set = 0; >> + >> + if (*(read - 2) == '/' && *(read - 1) == '*') >> + *(read - 2) = 0; >> + >> + return result; >> +} > > Do we need to check that the pattern is longer than 1 character here? If > it's a single character, it seems like this "*(read - 2)" will > dereference the byte before the string. This method is only called by add_pattern_to_hashsets(), which has a guard against paths of length less than 2, but thats' no excuse for dangerous pointer arithmetic here. But you also point out an even more confusing thing: why are we modifying based on the 'read' pointer, and not the 'set' pointer? This seems to work _accidentally_ only when the pattern has "<something>/*" and "<something>" has no escape characters. I had to recall exactly why we are dropping this "/*", but it's because the pattern _actually_ ends with "/*/" but the in-memory pattern has already dropped that last slash and applied PATTERN_FLAG_MUSTBEDIR. Here is a diff that I can apply to this patch to fix this problem _and_ demonstrate it in the tests: diff --git a/dir.c b/dir.c index 579f274d13..277577c8bf 100644 --- a/dir.c +++ b/dir.c @@ -633,6 +633,7 @@ int pl_hashmap_cmp(const void *unused_cmp_data, static char *dup_and_filter_pattern(const char *pattern) { char *set, *read; + size_t count = 0; char *result = xstrdup(pattern); set = result; @@ -647,11 +648,14 @@ static char *dup_and_filter_pattern(const char *pattern) set++; read++; + count++; } *set = 0; - if (*(read - 2) == '/' && *(read - 1) == '*') - *(read - 2) = 0; + if (count > 2 && + *(set - 1) == '*' && + *(set - 2) == '/') + *(set - 2) = 0; return result; } diff --git a/t/t1091-sparse-checkout-builtin.sh b/t/t1091-sparse-checkout-builtin.sh index 0a21a5e15d..20b0465f77 100755 --- a/t/t1091-sparse-checkout-builtin.sh +++ b/t/t1091-sparse-checkout-builtin.sh @@ -383,6 +383,7 @@ test_expect_success BSLASHPSPEC 'pattern-checks: escaped "*"' ' /* !/*/ /zbad\\dir/ + !/zbad\\dir/*/ /zdoes\*not\*exist/ /zdoes\*exist/ EOF With this extra line in the test, but compiling the old version of this patch, the test fails with: 'err' is not empty, it contains: + cat err warning: unrecognized negative pattern: '/zbad\\dir/*' warning: disabling cone pattern matching To ensure this negative pattern exists in the later patch where we set the patterns using the builtin, I'll add "zbad\\dir/bogus" to the list of directories to include, which will add another pattern to the set. Thanks, -Stolee