Re: [PATCH V4 04/10] Use scripting for cleaning and renaming of chains

Stefan Berger <stefanb@xxxxxxxxxxxxxxxxxx> · Thu, 17 Nov 2011 14:59:49 -0500




On 11/17/2011 12:33 PM, Eric Blake wrote:
On 10/26/2011 05:36 AM, Stefan Berger wrote:
The resulting list of chain names is then used to delete all the found
chains by first flushing and then deleting them.

The same function is also used for renaming temporary filters to their final
names.

I tested this with the bash and dash as script interpreters.
Well, there's still the overriding design nag that we want to avoid
shell interpretation if at all possible, but that's a _much_ bigger
rewrite for another day, so I can live with this current patch (it's not
making our use of sh any worse than it already was).

Agree.

+static const char ebtables_script_func_collect_chains[] =
Reading shell code that has been quoted into a C string literal to be
passed through printf is an interesting exercise :)

+    "collect_chains()\n"
Making sure I understand the rest of this patch, this function is
reached in two places by the rest of your patch, with code like:
+        virBufferAsprintf(buf, "$(collect_chains %s) ", rootchain);
thus, you are using one command substitution per rootchain, where this
function then outputs words to be parsed as part of a resulting command.

Yes, so this function starts out with the name of an ebtables chain and 
tries to determine all dependent chains of it, i.e., 'subchains'.

 #> ebtables -t nat -L libvirt-I-tck-test205002
 Bridge table: nat

 Bridge chain: libvirt-I-tck-test205002, entries: 5, policy: ACCEPT
 -p IPv4 -j I-tck-test205002-ipv4
 -p ARP -j I-tck-test205002-arp
 -p 0x8035 -j I-tck-test205002-rarp
 -p 0x835 -j ACCEPT
 -j DROP

Assuming I have the interface 'tck-test205002', I then run this function 
to find the chains
I-tck-test205002-ipv4, I-tck-test205002-arp and I-tck-test-205002-rarp 
and then visit
each one of those and try to find the chains they are 'jumping into'.

That's a lot of processes; would it be worth using a shell for loop
instead of a command-substitution per rootchain

I followed your suggested code below.

[...]

sc=$(/path/to/iptables -t nat -L $1 | ...

Can you please include a comment mentioning the typical output from such
a command that you intend to be parsing?

See above (uses ebtables rather than iptables).
Technically, you don't need backslash-newline after pipe; a trailing
pipe is sufficient to continue a statement to the next line in shell.
But it doesn't hurt to leave it in either.

+    "       sed -n \"/Bridge chain*/,$ s/.*\\-j \\([%s]-.*\\)/\\1/p\")\n"
                                      1  2     3       4      5

1. Did you really mean to match lines with "Bridge chai" or "Bridge
chainnn"?  That * is probably superfluous.

No, that was wrong.
2. "$ " is not portable shell.  To be portable, it should be "\$ " or '$ '.

Didn't know...
3. \- is not a portable sed escape.  But you don't need escaping for
'-', since searching for a literal - in a sed expression works just fine.

4. This %s matches up to the 'chains' argument in this code:
+    virBufferAsprintf(buf, NWFILTER_FUNC_COLLECT_CHAINS,
+                      ebtables_cmd_path, EBTABLES_DEFAULT_TABLE, chains);
which I in turn traced back to:
+    char chains[3] = {
+        CHAINPREFIX_HOST_IN_TEMP, // 'J'
+        CHAINPREFIX_HOST_OUT_TEMP, // 'P'
+        0};
so it looks like you are taking a line that looks like:
Bridge chain ... -j J-...
and turning it into:
J-...
while ignoring all other lines.

5. Use of "\(\)/\1" in shell is unusual (it's well-defined by POSIX that
the backslash is preserved literally if the next character is not
backquote, dollar, backslash, or newline), but for safety reason, I
prefer to write it as "\\(\\)/\\1" to make it obvious that in the shell
code I meant a literal backslash rather than relying on POSIX rules.

Put all of these together, and this line should probably be:

"   sed -n \"/Bridge chain/,\\$ s/.*-j \\\\([%s]-.*\\\\)/\\\\1/p\")\n"

Thanks.
Ah, so the end result is that you echo each name found, as well as
recurse on each name found to echo any dependent names.  Is order
important (all names from the first chain must be output before any
names from the recursive calls)?  If not, then I can rewrite this
function to avoid local sc in the first place.  Here, in shell form
(I'll leave you to convert it back into C string literal form):

collect_chains()
{
   for tmp in $(iptables -t nat -L $1 | \
         sed -n "/Bridge chain/,\$ s/.*-j \\([JP]-.*\\)/\\1/p"); do
     echo $tmp
     collect_chains $tmp
   done
}


I took this 'as-is'. Thanks.
+
+static const char ebiptables_script_func_rm_chains[] =
+    "rm_chains()\n"
It looks like you are calling rm_chains() with a single argument
containing a whitespace separated list of arguments.
+    virBufferAddLit(buf, "rm_chains \"$a\"\n");

Why not just call rm_chains with multiple arguments?
      virBufferAddLit(buf, "rm_chains $a\n");

+    "{\n"
+    " for tmp in `echo \"$1\"`; do [ -n \"$tmp\" ]&&  %s -t %s -F $tmp; done\n"
+    " for tmp in `echo \"$1\"`; do [ -n \"$tmp\" ]&&  %s -t %s -X $tmp; done\n"
+    "}\n";
Is it essential that all chains be flushed before any are deleted?  If
If they are flushed first they cannot reference each other. So that 
prevents not being able to delete one because it's still referenced by 
another one and then also the order in which the names are being 
processed doesn't matter as much.
so, then keeping this as two for loops makes sense; if not, then these
could be combined.

Same story about `echo "$1"` being identical to the much simpler $1.

Same story about $tmp always being non-empty in a shell for-loop where
the tokens were created by word-splitting.

I had a lot of problems when adding support for dash to it. Probably the 
IFS was wrong at some point and so I added lots of [ -n $tmp ].
With one argument containing a quoted list, I'd write this as:

rm_chains()
{
   for tmp in $1; do %s -t %s -F $tmp; done
   for tmp in $1; do %s -t %s -X $tmp; done
}

with multiple arguments (unquoted $a), I'd write it as:

rm_chains()
{
   for tmp in $*; do %s -t %s -F $tmp; done
   for tmp in $*; do %s -t %s -X $tmp; done
}

I took the 2nd version.

<sidenote>
A point that might be worth a separate cleanup patch: it looks awkward
to have to write the literal name of the script and the table name in
multiple places into the script:
+    virBufferAsprintf(buf, NWFILTER_FUNC_DELETE_CHAINS,
+                      ebtables_cmd_path, EBTABLES_DEFAULT_TABLE,
+                      ebtables_cmd_path, EBTABLES_DEFAULT_TABLE);

generating:

rm_chains()
{
   for tmp in $*; do /path/to/iptables -t nat -f $tmp; done...
..
rm_chains $a


Why not change your shell script to set up a shell variable up front for
ebtables_cmd_path and EBTABLES_DEFAULT_TABLE, and use that as a shell
variable instead of as a printf substitution?
As you mentioned, let's defer this to another patch.
I'd write this as:
rename_chains()
{
   for tmp in $1; do
     case $tmp in
       J*) /path/to/iptables -t nat -E $tmp I${tmp#?} ;;
       P*) /path/to/iptables -t nat -E $tmp O${tmp#?} ;;
     esac
   done
}

or back in your C-string format:

"rename_chains()\n"
"{\n"
"  for tmp in $1; do\n"
"    case $tmp in\n"
"      %c*) %s -t %s -E $tmp %c${tmp#?} ;;\n"
"      %c*) %s -t %s -E $tmp %c${tmp#?} ;;\n"
"    esac\n"
"  done\n"
"}\n"

Took this 'as-is'.
+
+static const char ebiptables_script_set_ifs[] =
+    "IFS=$' \\t\\n'\n"
+    "[ `expr length \"$IFS\"` != 3 ]&&  "
+        "IFS=$(echo | %s '{ print \"\\t\\n \"}')\n";
Oy - my head hurts!  $'' is not portable; neither is expr length.  And
sorry for this unwanted 'side-effect' :-)

doing IFS=$() is just wrong - that eats the trailing newline, leaving
you with just IFS=space-tab instead of the intended
IFS=space-tab-newline.  Just do:

static const char ebiptables_script_set_ifs[] =
     "nl='\n"
     "'; IFS=' ''\t'$nl\n"

and you're done.  No need for shenanigans with command substitution,
expr, or awk.
I took it as-is now. Thanks.
[Side note - the only reason I write ' ''\t' rather than ' \t' is that I
don't like seeing space-tab in generated scripts; some editors have a
tendency to corrupt space-tab in the name of being "helpful", so
separating the characters avoids that problem.]

+
+#define NWFILTER_FUNC_COLLECT_CHAINS ebtables_script_func_collect_chains
+#define NWFILTER_FUNC_DELETE_CHAINS ebiptables_script_func_rm_chains
This particular change in names made it a bit harder to search the rest
of your patch (I had to search two different strings to see where the
function definition was emitted, vs. where it was called); I'd suggest
either naming the function delete_chains, or the macro
NWFILTER_FUNC_RM_CHAINS, for consistency.
Fixed.
+#define NWFILTER_FUNC_RENAME_CHAINS ebiptables_script_func_rename_chains
+#define NWFILTER_FUNC_SET_IFS ebiptables_script_set_ifs

  #define VIRT_IN_CHAIN      "libvirt-in"
  #define VIRT_OUT_CHAIN     "libvirt-out"
@@ -2805,95 +2848,65 @@ ebtablesCreateTmpSubChain(virBufferPtr b
      return 0;
  }
Phew - made it past the function definitions.  Now, on to how they are
used.  [By the way, in case it's not obvious with my lengthy review
above, I _like_ the fact that you are using shell functions to
consolidate effort - as long as we are using a scripting language, we
might as well use all of its power - and it does mean that any future
attempt to refactor this away from a temporary shell script into libvirt
primitives will have to keep that in mind]


-
-static int
-_ebtablesRemoveSubChain(virBufferPtr buf,
-                        int incoming,
-                        const char *ifname,
-                        enum l3_proto_idx protoidx,
-                        int isTempChain)
+static int _ebtablesRemoveSubChains(virBufferPtr buf,
+                                    const char *ifname,
+                                    const char *chains)
  {
-    char rootchain[MAX_CHAINNAME_LENGTH], chain[MAX_CHAINNAME_LENGTH];
-    char chainPrefix;
-
-    if (isTempChain) {
-        chainPrefix =(incoming) ? CHAINPREFIX_HOST_IN_TEMP
-                                : CHAINPREFIX_HOST_OUT_TEMP;
-    } else {
-        chainPrefix =(incoming) ? CHAINPREFIX_HOST_IN
-                                : CHAINPREFIX_HOST_OUT;
-    }
+    char rootchain[MAX_CHAINNAME_LENGTH];
+    unsigned i;

-    PRINT_ROOT_CHAIN(rootchain, chainPrefix, ifname);
-    PRINT_CHAIN(chain, chainPrefix, ifname, l3_protocols[protoidx].val);
-
-    virBufferAsprintf(buf,
-                      "%s -t %s -D %s -p 0x%x -j %s" CMD_SEPARATOR
-                      "%s -t %s -F %s" CMD_SEPARATOR
-                      "%s -t %s -X %s" CMD_SEPARATOR,
+    virBufferAsprintf(buf, NWFILTER_FUNC_COLLECT_CHAINS,
+                      ebtables_cmd_path, EBTABLES_DEFAULT_TABLE, chains);
+    virBufferAsprintf(buf, NWFILTER_FUNC_DELETE_CHAINS,
                        ebtables_cmd_path, EBTABLES_DEFAULT_TABLE,
-                      rootchain, l3_protocols[protoidx].attr, chain,
+                      ebtables_cmd_path, EBTABLES_DEFAULT_TABLE);

-                      ebtables_cmd_path, EBTABLES_DEFAULT_TABLE, chain,
+    virBufferAsprintf(buf, NWFILTER_FUNC_SET_IFS, gawk_cmd_path);
Given my above reformulation of SET_IFS, you don't need gawk_cmd_path here.

Fixed.
+    virBufferAddLit(buf, "a=\"");
+    for (i = 0; chains[i] != 0; i++) {
+        PRINT_ROOT_CHAIN(rootchain, chains[i], ifname);
+        virBufferAsprintf(buf, "$(collect_chains %s) ", rootchain);
+    }
+    virBufferAddLit(buf, "\"\n");
This results in the shell code:

a="$(collect_chains name) $(collect_chains name) "

If you added looping in the shell definition of collect_chains, this
could instead be:

a=$(collect_chains name name)

Not essential, but food for thought for a smaller script.
Did that now.

-                      ebtables_cmd_path, EBTABLES_DEFAULT_TABLE, chain);
+    for (i = 0; chains[i] != 0; i++) {
+        PRINT_ROOT_CHAIN(rootchain, chains[i], ifname);
+        virBufferAsprintf(buf,
+                          "%s -t %s -F %s\n",
+                          ebtables_cmd_path, EBTABLES_DEFAULT_TABLE,
+                          rootchain);
+    }
+    virBufferAddLit(buf, "rm_chains \"$a\"\n");
This results in:

rm_chains "$a"

See my earlier comments where using $* instead of $1 would let you write
this as:

rm_chains $a

instead.

Using the 'rm_chains $a' now.
+static int
+ebtablesRenameTmpSubAndRootChains(virBufferPtr buf,
+                                  const char *ifname)
+{
+    char rootchain[MAX_CHAINNAME_LENGTH];
+    unsigned i;
+    char chains[3] = {
+        CHAINPREFIX_HOST_IN_TEMP,
+        CHAINPREFIX_HOST_OUT_TEMP,
+        0};
+
+    virBufferAsprintf(buf, NWFILTER_FUNC_COLLECT_CHAINS,
+                      ebtables_cmd_path, EBTABLES_DEFAULT_TABLE, chains);
This means you are outputting the function definition from two different
call paths.  It doesn't hurt to (re-)define a function to the same
value, while avoiding the function if it isn't used; on the other hand,
I'm wondering if it is worth outputting the functions as part of the
file header rather than risking having the function definitions appear
multiple times throughout the temporary script.  I can live with your
current approach, so don't bother refactoring things unless you want to.
Then I don't refactor this one.
[I speak from experience - I'm one of the autoconf maintainers, and
solving the problem of how to generate shell script output based on m4
input, where pre-requisite shell functions are emitted at most once, up
front, but used multiple times through the rest of the script, all in
order to reduce the total size of the overall script, proved to be an
interesting challenge in writing a code generator.]

+    virBufferAsprintf(buf, NWFILTER_FUNC_RENAME_CHAINS,
+                      CHAINPREFIX_HOST_IN_TEMP,
+                      ebtables_cmd_path, EBTABLES_DEFAULT_TABLE,
+                      CHAINPREFIX_HOST_IN,
+                      CHAINPREFIX_HOST_OUT_TEMP,
+                      ebtables_cmd_path, EBTABLES_DEFAULT_TABLE,
+                      CHAINPREFIX_HOST_OUT);
+
+    virBufferAsprintf(buf, NWFILTER_FUNC_SET_IFS, gawk_cmd_path);
+    virBufferAddLit(buf, "a=\"");
+    for (i = 0; chains[i] != 0; i++) {
+        PRINT_ROOT_CHAIN(rootchain, chains[i], ifname);
+        virBufferAsprintf(buf, "$(collect_chains %s) ", rootchain);
      }
Similar comments about collect_chains usage possibly being easier to
understand if it includes a shell for loop over multiple arguments.

Converted.
Looking forward to seeing what v5 looks like!

I did all conversions and libvirt-tck test cases pass without problems 
or leaving any tables behind upon VM teardown.

Thanks a lot for your review and help. I'll send out v5 shortly.

   Stefan

--
libvir-list mailing list
libvir-list@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/libvir-list