Re: [PATCH V4 04/10] Use scripting for cleaning and renaming of chains

Eric Blake <eblake@xxxxxxxxxx> · Thu, 17 Nov 2011 10:33:26 -0700

On 10/26/2011 05:36 AM, Stefan Berger wrote:
> Use scripts for the renaming and cleaning up of chains. This allows us to get
> rid of some of the code that is only capable of renaming and removing chains
> whose names are hardcoded.

Resuming where I left off, in the hopes that this helps before you post
v5...

> 
> A shell function 'collect_chains' is introduced that is given the name
> of an ebtables chain and then recursively determines the names of all
> chanins that are accessed from this chain and its sub-chains using 'jumps'.

s/chanins/chains/

> 
> The resulting list of chain names is then used to delete all the found
> chains by first flushing and then deleting them.
> 
> The same function is also used for renaming temporary filters to their final
> names.
> 
> I tested this with the bash and dash as script interpreters.

Well, there's still the overriding design nag that we want to avoid
shell interpretation if at all possible, but that's a _much_ bigger
rewrite for another day, so I can live with this current patch (it's not
making our use of sh any worse than it already was).

>  
> +static const char ebtables_script_func_collect_chains[] =

Reading shell code that has been quoted into a C string literal to be
passed through printf is an interesting exercise :)

> +    "collect_chains()\n"

Making sure I understand the rest of this patch, this function is
reached in two places by the rest of your patch, with code like:
+        virBufferAsprintf(buf, "$(collect_chains %s) ", rootchain);
thus, you are using one command substitution per rootchain, where this
function then outputs words to be parsed as part of a resulting command.

That's a lot of processes; would it be worth using a shell for loop
instead of a command-substitution per rootchain

> +    "{\n"
> +    "  local sc\n"

'local' is not portable; POSIX doesn't guarantee it.  I suppose we can
get away with it since both bash and dash support it, and this code is
specific to Linux where we know /bin/sh will be either bash or dash, but
it would be even nicer if we could stick to POSIX syntax (which means
avoiding 'local' and treating all variables are global, and you have to
be careful that variables used inside a function are not likely to cause
conflicts with any other variables used globally).

> +    "  sc=$(%s -t %s -L $1 | \\\n"

Just so I'm making sure I understand things, this line is paired with
ebtables_cmd_path and EBTABLES_DEFAULT_TABLE, which results in something
like this in shell ($1 being a rootchain name, fitting the pattern
libvirt-%c-%s, so I think you're okay that $1 wasn't quoted):

sc=$(/path/to/iptables -t nat -L $1 | ...

Can you please include a comment mentioning the typical output from such
a command that you intend to be parsing?

Technically, you don't need backslash-newline after pipe; a trailing
pipe is sufficient to continue a statement to the next line in shell.
But it doesn't hurt to leave it in either.

> +    "       sed -n \"/Bridge chain*/,$ s/.*\\-j \\([%s]-.*\\)/\\1/p\")\n"
                                     1  2     3       4      5

1. Did you really mean to match lines with "Bridge chai" or "Bridge
chainnn"?  That * is probably superfluous.

2. "$ " is not portable shell.  To be portable, it should be "\$ " or '$ '.

3. \- is not a portable sed escape.  But you don't need escaping for
'-', since searching for a literal - in a sed expression works just fine.

4. This %s matches up to the 'chains' argument in this code:
+    virBufferAsprintf(buf, NWFILTER_FUNC_COLLECT_CHAINS,
+                      ebtables_cmd_path, EBTABLES_DEFAULT_TABLE, chains);
which I in turn traced back to:
+    char chains[3] = {
+        CHAINPREFIX_HOST_IN_TEMP, // 'J'
+        CHAINPREFIX_HOST_OUT_TEMP, // 'P'
+        0};
so it looks like you are taking a line that looks like:
Bridge chain ... -j J-...
and turning it into:
J-...
while ignoring all other lines.

5. Use of "/\1" in shell is unusual (it's well-defined by POSIX that
the backslash is preserved literally if the next character is not
backquote, dollar, backslash, or newline), but for safety reason, I
prefer to write it as "\$\$/\\1" to make it obvious that in the shell
code I meant a literal backslash rather than relying on POSIX rules.

Put all of these together, and this line should probably be:

"   sed -n \"/Bridge chain/,\\$ s/.*-j \\\\([%s]-.*\\\\)/\\\\1/p\")\n"

> +    "  for tmp in `echo \"$sc\"`; do\n"

I prefer $() over ``.  But this echo is a wasted process -  no need to
echo when you can just use $sc directly:

"  for tmp in $sc; do\n"

> +    "    [ -n \"$tmp\" ] && sc=\"$sc $(collect_chains $tmp)\"\n"

Odd to be assigning to $sc in the middle of a loop based on the contents
of $sc, and problematic if we change $sc to be global instead of local.
 Furthermore, [ -n $tmp ] will always be true (whether with your 'for
tmp in `echo "$sc"`' or my 'for tmp in $sc', the point remains that word
splitting is done, so each value of $tmp will be non-empty and contain
no spaces).

> +    "  done\n"
> +    "  echo \"$sc\"\n"
> +    "}\n";

Ah, so the end result is that you echo each name found, as well as
recurse on each name found to echo any dependent names.  Is order
important (all names from the first chain must be output before any
names from the recursive calls)?  If not, then I can rewrite this
function to avoid local sc in the first place.  Here, in shell form
(I'll leave you to convert it back into C string literal form):

collect_chains()
{
  for tmp in $(iptables -t nat -L $1 | \
        sed -n "/Bridge chain/,\$ s/.*-j \\([JP]-.*\\)/\\1/p"); do
    echo $tmp
    collect_chains $tmp
  done
}

> +
> +static const char ebiptables_script_func_rm_chains[] =
> +    "rm_chains()\n"

It looks like you are calling rm_chains() with a single argument
containing a whitespace separated list of arguments.
+    virBufferAddLit(buf, "rm_chains \"$a\"\n");

Why not just call rm_chains with multiple arguments?
     virBufferAddLit(buf, "rm_chains $a\n");

> +    "{\n"
> +    " for tmp in `echo \"$1\"`; do [ -n \"$tmp\" ] && %s -t %s -F $tmp; done\n"
> +    " for tmp in `echo \"$1\"`; do [ -n \"$tmp\" ] && %s -t %s -X $tmp; done\n"
> +    "}\n";

Is it essential that all chains be flushed before any are deleted?  If
so, then keeping this as two for loops makes sense; if not, then these
could be combined.

Same story about `echo "$1"` being identical to the much simpler $1.

Same story about $tmp always being non-empty in a shell for-loop where
the tokens were created by word-splitting.

With one argument containing a quoted list, I'd write this as:

rm_chains()
{
  for tmp in $1; do %s -t %s -F $tmp; done
  for tmp in $1; do %s -t %s -X $tmp; done
}

with multiple arguments (unquoted $a), I'd write it as:

rm_chains()
{
  for tmp in $*; do %s -t %s -F $tmp; done
  for tmp in $*; do %s -t %s -X $tmp; done
}

<sidenote>
A point that might be worth a separate cleanup patch: it looks awkward
to have to write the literal name of the script and the table name in
multiple places into the script:
+    virBufferAsprintf(buf, NWFILTER_FUNC_DELETE_CHAINS,
+                      ebtables_cmd_path, EBTABLES_DEFAULT_TABLE,
+                      ebtables_cmd_path, EBTABLES_DEFAULT_TABLE);

generating:

rm_chains()
{
  for tmp in $*; do /path/to/iptables -t nat -f $tmp; done...
..
rm_chains $a

Why not change your shell script to set up a shell variable up front for
ebtables_cmd_path and EBTABLES_DEFAULT_TABLE, and use that as a shell
variable instead of as a printf substitution?

That is, have your overall script look like:

TABLESCMD=/path/to/iptables
TABLE=nat
rm_chains()
{
  for tmp in $*; do $TABLESCMD -t $TABLE -f $tmp; done
...
rm_chains $a

where the script was generated by:

virBufferAsprintf(buf, "TABLESCMD=%s\nTABLES=%s\n",
                  ebtables_cmd_path, EBTABLES_DEFAULT_TABLE);
virBufferAsprintf(buf, NWFILTER_FUNC_DELETE_CHAINS);
...
virBufferAddLit(buf, "rm_chains $a\n");

</sidenote>

> +
> +static const char ebiptables_script_func_rename_chains[] =
> +    "rename_chains()\n"
> +    "{\n"
> +    "  for tmp in `echo \"$1\"`; do\n"

Again, `echo "$1"` is overkill for $1,

> +    "    if [ -n \"$tmp\" ]; then\n"

and testing for non-empty $tmp is wasted effort, if the loop is over a
word-split $1.

> +    "      tmp2=`expr substr \"$tmp\" 1 1`\n"

expr substr is not portable to POSIX.  But you don't need it, when a
shell case statement, followed by a POSIX variable expansion, will do
the same.

> +    "      if [ $tmp2 = \"%c\" ]; then\n"
> +    "          %s -t %s -E \"$tmp\" \"%c\"`expr substr \"$tmp\" 2 33`\n"
> +    "      elif [ $tmp2 = \"%c\" ]; then\n"
> +    "          %s -t %s -E \"$tmp\" \"%c\"`expr substr \"$tmp\" 2 33`\n"
> +    "      fi\n"
> +    "    fi\n"
> +    "  done\n"
> +    "}\n";

I'd write this as:

rename_chains()
{
  for tmp in $1; do
    case $tmp in
      J*) /path/to/iptables -t nat -E $tmp I${tmp#?} ;;
      P*) /path/to/iptables -t nat -E $tmp O${tmp#?} ;;
    esac
  done
}

or back in your C-string format:

"rename_chains()\n"
"{\n"
"  for tmp in $1; do\n"
"    case $tmp in\n"
"      %c*) %s -t %s -E $tmp %c${tmp#?} ;;\n"
"      %c*) %s -t %s -E $tmp %c${tmp#?} ;;\n"
"    esac\n"
"  done\n"
"}\n"

> +
> +static const char ebiptables_script_set_ifs[] =
> +    "IFS=$' \\t\\n'\n"
> +    "[ `expr length \"$IFS\"` != 3 ] && "
> +        "IFS=$(echo | %s '{ print \"\\t\\n \"}')\n";

Oy - my head hurts!  $'' is not portable; neither is expr length.  And
doing IFS=$() is just wrong - that eats the trailing newline, leaving
you with just IFS=space-tab instead of the intended
IFS=space-tab-newline.  Just do:

static const char ebiptables_script_set_ifs[] =
    "nl='\n"
    "'; IFS=' ''\t'$nl\n"

and you're done.  No need for shenanigans with command substitution,
expr, or awk.

[Side note - the only reason I write ' ''\t' rather than ' \t' is that I
don't like seeing space-tab in generated scripts; some editors have a
tendency to corrupt space-tab in the name of being "helpful", so
separating the characters avoids that problem.]

> +
> +#define NWFILTER_FUNC_COLLECT_CHAINS ebtables_script_func_collect_chains
> +#define NWFILTER_FUNC_DELETE_CHAINS ebiptables_script_func_rm_chains

This particular change in names made it a bit harder to search the rest
of your patch (I had to search two different strings to see where the
function definition was emitted, vs. where it was called); I'd suggest
either naming the function delete_chains, or the macro
NWFILTER_FUNC_RM_CHAINS, for consistency.

> +#define NWFILTER_FUNC_RENAME_CHAINS ebiptables_script_func_rename_chains
> +#define NWFILTER_FUNC_SET_IFS ebiptables_script_set_ifs
>  
>  #define VIRT_IN_CHAIN      "libvirt-in"
>  #define VIRT_OUT_CHAIN     "libvirt-out"
> @@ -2805,95 +2848,65 @@ ebtablesCreateTmpSubChain(virBufferPtr b
>      return 0;
>  }

Phew - made it past the function definitions.  Now, on to how they are
used.  [By the way, in case it's not obvious with my lengthy review
above, I _like_ the fact that you are using shell functions to
consolidate effort - as long as we are using a scripting language, we
might as well use all of its power - and it does mean that any future
attempt to refactor this away from a temporary shell script into libvirt
primitives will have to keep that in mind]

>  
> -
> -static int
> -_ebtablesRemoveSubChain(virBufferPtr buf,
> -                        int incoming,
> -                        const char *ifname,
> -                        enum l3_proto_idx protoidx,
> -                        int isTempChain)
> +static int _ebtablesRemoveSubChains(virBufferPtr buf,
> +                                    const char *ifname,
> +                                    const char *chains)
>  {
> -    char rootchain[MAX_CHAINNAME_LENGTH], chain[MAX_CHAINNAME_LENGTH];
> -    char chainPrefix;
> -
> -    if (isTempChain) {
> -        chainPrefix =(incoming) ? CHAINPREFIX_HOST_IN_TEMP
> -                                : CHAINPREFIX_HOST_OUT_TEMP;
> -    } else {
> -        chainPrefix =(incoming) ? CHAINPREFIX_HOST_IN
> -                                : CHAINPREFIX_HOST_OUT;
> -    }
> +    char rootchain[MAX_CHAINNAME_LENGTH];
> +    unsigned i;
>  
> -    PRINT_ROOT_CHAIN(rootchain, chainPrefix, ifname);
> -    PRINT_CHAIN(chain, chainPrefix, ifname, l3_protocols[protoidx].val);
> -
> -    virBufferAsprintf(buf,
> -                      "%s -t %s -D %s -p 0x%x -j %s" CMD_SEPARATOR
> -                      "%s -t %s -F %s" CMD_SEPARATOR
> -                      "%s -t %s -X %s" CMD_SEPARATOR,
> +    virBufferAsprintf(buf, NWFILTER_FUNC_COLLECT_CHAINS,
> +                      ebtables_cmd_path, EBTABLES_DEFAULT_TABLE, chains);
> +    virBufferAsprintf(buf, NWFILTER_FUNC_DELETE_CHAINS,
>                        ebtables_cmd_path, EBTABLES_DEFAULT_TABLE,
> -                      rootchain, l3_protocols[protoidx].attr, chain,
> +                      ebtables_cmd_path, EBTABLES_DEFAULT_TABLE);
>  
> -                      ebtables_cmd_path, EBTABLES_DEFAULT_TABLE, chain,
> +    virBufferAsprintf(buf, NWFILTER_FUNC_SET_IFS, gawk_cmd_path);

Given my above reformulation of SET_IFS, you don't need gawk_cmd_path here.

> +    virBufferAddLit(buf, "a=\"");
> +    for (i = 0; chains[i] != 0; i++) {
> +        PRINT_ROOT_CHAIN(rootchain, chains[i], ifname);
> +        virBufferAsprintf(buf, "$(collect_chains %s) ", rootchain);
> +    }
> +    virBufferAddLit(buf, "\"\n");

This results in the shell code:

a="$(collect_chains name) $(collect_chains name) "

If you added looping in the shell definition of collect_chains, this
could instead be:

a=$(collect_chains name name)

Not essential, but food for thought for a smaller script.

>  
> -                      ebtables_cmd_path, EBTABLES_DEFAULT_TABLE, chain);
> +    for (i = 0; chains[i] != 0; i++) {
> +        PRINT_ROOT_CHAIN(rootchain, chains[i], ifname);
> +        virBufferAsprintf(buf,
> +                          "%s -t %s -F %s\n",
> +                          ebtables_cmd_path, EBTABLES_DEFAULT_TABLE,
> +                          rootchain);
> +    }
> +    virBufferAddLit(buf, "rm_chains \"$a\"\n");

This results in:

rm_chains "$a"

See my earlier comments where using $* instead of $1 would let you write
this as:

rm_chains $a

instead.

> +static int
> +ebtablesRenameTmpSubAndRootChains(virBufferPtr buf,
> +                                  const char *ifname)
> +{
> +    char rootchain[MAX_CHAINNAME_LENGTH];
> +    unsigned i;
> +    char chains[3] = {
> +        CHAINPREFIX_HOST_IN_TEMP,
> +        CHAINPREFIX_HOST_OUT_TEMP,
> +        0};
> +
> +    virBufferAsprintf(buf, NWFILTER_FUNC_COLLECT_CHAINS,
> +                      ebtables_cmd_path, EBTABLES_DEFAULT_TABLE, chains);

This means you are outputting the function definition from two different
call paths.  It doesn't hurt to (re-)define a function to the same
value, while avoiding the function if it isn't used; on the other hand,
I'm wondering if it is worth outputting the functions as part of the
file header rather than risking having the function definitions appear
multiple times throughout the temporary script.  I can live with your
current approach, so don't bother refactoring things unless you want to.

[I speak from experience - I'm one of the autoconf maintainers, and
solving the problem of how to generate shell script output based on m4
input, where pre-requisite shell functions are emitted at most once, up
front, but used multiple times through the rest of the script, all in
order to reduce the total size of the overall script, proved to be an
interesting challenge in writing a code generator.]

> +    virBufferAsprintf(buf, NWFILTER_FUNC_RENAME_CHAINS,
> +                      CHAINPREFIX_HOST_IN_TEMP,
> +                      ebtables_cmd_path, EBTABLES_DEFAULT_TABLE,
> +                      CHAINPREFIX_HOST_IN,
> +                      CHAINPREFIX_HOST_OUT_TEMP,
> +                      ebtables_cmd_path, EBTABLES_DEFAULT_TABLE,
> +                      CHAINPREFIX_HOST_OUT);
> +
> +    virBufferAsprintf(buf, NWFILTER_FUNC_SET_IFS, gawk_cmd_path);
> +    virBufferAddLit(buf, "a=\"");
> +    for (i = 0; chains[i] != 0; i++) {
> +        PRINT_ROOT_CHAIN(rootchain, chains[i], ifname);
> +        virBufferAsprintf(buf, "$(collect_chains %s) ", rootchain);
>      }

Similar comments about collect_chains usage possibly being easier to
understand if it includes a shell for loop over multiple arguments.

Looking forward to seeing what v5 looks like!

-- 
Eric Blake   eblake@xxxxxxxxxx    +1-919-301-3266
Libvirt virtualization library http://libvirt.org

Attachment:
signature.asc

Description: OpenPGP digital signature
--
libvir-list mailing list
libvir-list@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/libvir-list