On Thu, Sep 1, 2022 at 8:35 AM Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> wrote: > On Thu, Sep 01 2022, Eric Sunshine via GitGitGadget wrote: > Just generally on this series: > > > + $tag =~ s/['"\\]//g; > > I think this would be a *lot* easier to read if all of these little > regex decls could be split out into some "grammar" class, or other > helper module/namespace. So e.g.: > > my $SCRIPT_QUOTE_RX = qr/['"\\]/; Taken out of context (as in the quoted snippet), it may indeed be difficult to understand what that line is doing, however in context with a meaningful function name: sub scan_heredoc_tag { ... my $tag = $self->scan_token(); $tag =~ s/['"\\]//g; push(@{$self->{heretags}}, $indented ? "\t$tag" : "$tag"); ... } for someone who is familiar with common heredoc tag quoting/escaping (i.e. <<'EOF', <<"EOF", <<\EOF), I find the inline character class `['"\\]` much easier to understand than some opaque name such as $SCRIPT_QUOTE_RX, doubly so because the definition of the named regex might be far removed from the actual code which uses it, which would require going and studying that definition before being able to understand what this code is doing. I grasp you made that name up on-the-fly as an example, but that does highlight another reason why I'd be hesitant to try to pluck out and name these regexes. Specifically, naming is hard and I don't trust that I could come up with succinct meaningful names which would convey what a regex does as well as the actual regex itself conveys what it does. In context within the well-named function, `s/['"\\]//g` is obviously stripping quoting/escaping from the tag name; trying to come up with a succinct yet accurate name to convey that intention is difficult. And this is just one example. The script is littered with little regexes like this, and they are almost all unique, thus making the task of inventing succinct meaningful names extra difficult. And, as noted above, I'm not at all convinced that plucking the regex out of its natural context -- thus making the reader go elsewhere to find the definition of the regex -- would help improve comprehension. > Then: > > > + return $cc if $cc =~ /^(?:&&|\|\||>>|;;|<&|>&|<>|>\|)$/; > > my $SCRIPT_WHATEVER_RX = qr/ > ^(?: > && > | > \|\| > [...] > /x; > > etc., i.e. we could then make use of /x to add inline comments to these. `/x` does make this slightly easier to grok, and this is a an example of a regex which might be easy to name (i.e. $TWO_CHAR_OPERATOR), but -- extra mandatory escaping aside -- it's not hard to understand this one as-is; it's pretty obvious that it's looking for operators `&&`, `||`, `>>`, `;;`, `<&`, `>&`, `<>`, and `>|`.