Re: [PATCHv8 5/5] pathspec: allow querying for attributes

Stefan Beller <sbeller@xxxxxxxxxx> · Thu, 19 May 2016 13:42:19 -0700

On Thu, May 19, 2016 at 11:53 AM, Junio C Hamano <gitster@xxxxxxxxx> wrote:
> Stefan Beller <sbeller@xxxxxxxxxx> writes:
>
>> diff --git a/Documentation/glossary-content.txt b/Documentation/glossary-content.txt
>> index cafc284..aa9f220 100644
>> --- a/Documentation/glossary-content.txt
>> +++ b/Documentation/glossary-content.txt
>> @@ -384,6 +384,23 @@ full pathname may have special meaning:
>>  +
>>  Glob magic is incompatible with literal magic.
>>
>> +attr;;
>> +     Additionally to matching the pathspec, the path must have the
>> +     attribute as specified. The syntax for specifying the required
>> +     attributes is "`attr: [mode] <attribute name> [=value]`"
>> ++
>> +Attributes can have 4 states (Set, Unset, Set to a value, unspecified) and
>> +you can query each attribute for certain states. The "`[mode]`" is a special
>> +character to indicate which attribute states are looked for. The following
>> +modes are available:
>> +
>> + - an empty "`[mode]`" matches if the attribute is set
>> + - "`-`" the attribute must be unset
>> + - "`!`" the attribute must be unspecified
>> + - an empty "`[mode]`" combined with "`[=value]`" matches if the attribute has
>> +   the given value.
>> ++
>
> As an initial design, I find this much more agreeable than the
> previous rounds.  I however find the phrasing of the above harder
> than necessary to understand, for a few reasons.
>
>  * Mixed use of "X matches if ..." and "... must be Y" makes it
>    unclear if they are talking about different kind of things, or
>    the same kind of things in merely different ways.
>
>  * It does not make it clear "=value" is only meaningful when [mode]
>    is empty.
>
> Perhaps dropping the '[mode]' thing altogether and instead saying
>
>         After `attr:` comes a space separated list of "attribute
>         requirements", all of which must be met in order for the
>         path to be considered a match; this is in addition to the
>         usual non-magic pathspec pattern matching.
>
>         Each of the attribute requirements for the path takes one of
>         these forms:
>
>         - "`ATTR`" requires that the attribute `ATTR` must be set.
>
>         - "`-ATTR`" requires that the attribute `ATTR` must be unset.
>
>         - "`ATTR=VALUE`" requires that the attribute `ATTR` must be
>           set to the string `VALUE`.
>
>         - "`!ATTR`" requires that the attribute `ATTR` must be
>           unspecified.
>
> would make the resulting text easier to read?

That is way better!

>
>> +static int match_attrs(const char *name, int namelen,
>> +                    const struct pathspec_item *item)
>> +{
>> +     int i;
>> +
>> +     git_check_attr_counted(name, namelen, item->attr_check);
>> +     for (i = 0; i < item->attr_match_nr; i++) {
>> +             const char *value;
>> +             int matched;
>> +             enum attr_match_mode match_mode;
>> +
>> +             value = item->attr_check->check[i].value;
>> +             match_mode = item->attr_match[i].match_mode;
>> +
>> +             if (ATTR_TRUE(value))
>> +                     matched = match_mode == MATCH_SET;
>> +             else if (ATTR_FALSE(value))
>> +                     matched = match_mode == MATCH_UNSET;
>> +             else if (ATTR_UNSET(value))
>> +                     matched = match_mode == MATCH_UNSPECIFIED;
>
> readability nit:
>
>         matched = (match_mode == MATCH_WHATEVER);
>
> would be easier to view

ok.

>
>> +             else
>> +                     matched = (match_mode == MATCH_VALUE &&
>> +                                !strcmp(item->attr_match[i].value, value));
>
> and would match the last case above better.
>
>> +static void parse_pathspec_attr_match(struct pathspec_item *item, const char *value)
>> +{
>> +     struct string_list_item *si;
>> +     struct string_list list = STRING_LIST_INIT_DUP;
>> +
>> +
>> +     if (!value || !strlen(value))
>> +             die(_("attr spec must not be empty"));
>> +
>> +     string_list_split(&list, value, ' ', -1);
>> +     string_list_remove_empty_items(&list, 0);
>> +
>> +     if (!item->attr_check)
>> +             item->attr_check = git_attr_check_alloc();
>> +     else
>> +             die(_("Only one 'attr:' specification is allowed."));
>> +
>> +     ALLOC_GROW(item->attr_match, item->attr_match_nr + list.nr, item->attr_match_alloc);
>> +
>> +     for_each_string_list_item(si, &list) {
>> +             size_t attr_len;
>> +
>> +             int j = item->attr_match_nr++;
>> +             const char *attr = si->string;
>> +             struct attr_match *am = &item->attr_match[j];
>> +
>> +             if (attr[0] == '!')
>> +                     am->match_mode = MATCH_UNSPECIFIED;
>> +             else if (attr[0] == '-')
>> +                     am->match_mode = MATCH_UNSET;
>> +             else
>> +                     am->match_mode = MATCH_SET;
>> +
>> +             if (am->match_mode != MATCH_SET)
>> +                     /* skip first character */
>> +                     attr++;
>> +             attr_len = strcspn(attr, "=");
>> +             if (attr[attr_len] == '=') {
>> +                     am->match_mode = MATCH_VALUE;
>> +                     am->value = xstrdup(&attr[attr_len + 1]);
>> +                     if (strchr(am->value, '\\'))
>> +                             die(_("attr spec values must not contain backslashes"));
>> +             } else
>> +                     am->value = NULL;
>> +
>
> Let me try a variant to see if we can improve it (thinking aloud).
>
>         switch (*attr) {
>         case '!':
>                 am->match_mode = MATCH_UNSPECIFIED;
>                 attr++;
>                 break;
>         case '-':
>                 am->match_mode = MATCH_UNSET;
>                 attr++;
>                 break;
>         default:
>                 attr_len = strcspn(attr, "=");
>                 if (attr[attr_len] != '=')
>                         am->match_mode = MATCH_SET;
>                 else {
>                         am->match_mode = MATCH_VALUE;
>                         am->value = xstrdup(...);
>                 }
>                 break;
>         }
>
> Using switch/case does not save line counts at all but it may still
> make the result easier to follow.  More importantly, it would help
> you catch "!ATTR=VAR" as an "invalid attribute name 'ATTR=VAR' was
> requested" error by arranging the code to make sure that scanning
> for '=' would not happen in !/- case (in other words, while thiking
> aloud with an alternative way to write the same thing, I spotted a
> bug in the original ;-).

That makes sense.

>
>> +             if (!attr_name_valid(attr, attr_len)) {
>> +                     struct strbuf sb = STRBUF_INIT;
>> +                     am->match_mode = INVALID_ATTR;
>> +                     invalid_attr_name_message(&sb, attr, attr_len);
>> +                     die(_("invalid attribute in '%s': '%s'"), value, sb.buf);
>> +             }
>> +
>> +             am->attr = git_attr_counted(attr, attr_len);
>
> I'd do this the other order, i.e.
>
>         am->attr = git_attr_counted(...);
>         if (!am->attr) {
>                 ...
>                 die(...);
>         }
>
> It is wasteful to call attr_name_valid() yourself before seeing an
> error from git_attr_counted().
>
>> diff --git a/t/t6134-pathspec-with-labels.sh b/t/t6134-pathspec-with-labels.sh
>> new file mode 100755
>> index 0000000..c0d8cda
>> --- /dev/null
>> +++ b/t/t6134-pathspec-with-labels.sh
>> @@ -0,0 +1,166 @@
>> +#!/bin/sh
>> +
>> +test_description='test labels in pathspecs'
>> +. ./test-lib.sh
>> +
>> +test_expect_success 'setup a tree' '
>> +     mkdir sub &&
>> +     for p in fileA fileB fileC fileAB fileAC fileBC fileNoLabel fileUnsetLabel fileSetLabel fileValue fileWrongLabel; do
>> +             : >$p &&
>> +             git add $p &&
>> +             : >sub/$p
>> +             git add sub/$p
>> +     done &&
>> +     git commit -m $p &&
>> +     git ls-files >actual &&
>> +     cat <<EOF >expect &&
>> +fileA
>> +fileAB
>> +fileAC
>> +fileB
>> +fileBC
>> +fileC
>> +fileNoLabel
>> +fileSetLabel
>> +fileUnsetLabel
>> +fileValue
>> +fileWrongLabel
>> +sub/fileA
>> +sub/fileAB
>> +sub/fileAC
>> +sub/fileB
>> +sub/fileBC
>> +sub/fileC
>> +sub/fileNoLabel
>> +sub/fileSetLabel
>> +sub/fileUnsetLabel
>> +sub/fileValue
>> +sub/fileWrongLabel
>> +EOF
>
>         cat <<-\EOF >expect &&
>         fileA
>         ...
>         EOF
>
> to indent the whole thing?

$ grep -r "cat" |grep "<<-"|wc -l
915
$ grep -r "cat" |grep "<<"|grep -v "<<-"| wc -l
1329

I was undecided what the prevailing style is, some did indent,
others did not.

Ok, will indent.

>> +sq="'"
>
> I do not think you use this $sq variable below, but I may have
> overlooked its use.
>

ok

>> +     git ls-files :\(attr:\!label\) >actual &&
>
> Why not the more normal-looking ":(attr:!label)"?  I do not think
> history substitution would trigger in an script.
>
> And no, "I may cut and paste into my interactive shell while
> debugging" is not a good reason to make the end-result (i.e. script)
> less readable.

ok
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html