On 10/26/2011 05:36 AM, Stefan Berger wrote: > Use scripts for the renaming and cleaning up of chains. This allows us to get > rid of some of the code that is only capable of renaming and removing chains > whose names are hardcoded. Resuming where I left off, in the hopes that this helps before you post v5... > > A shell function 'collect_chains' is introduced that is given the name > of an ebtables chain and then recursively determines the names of all > chanins that are accessed from this chain and its sub-chains using 'jumps'. s/chanins/chains/ > > The resulting list of chain names is then used to delete all the found > chains by first flushing and then deleting them. > > The same function is also used for renaming temporary filters to their final > names. > > I tested this with the bash and dash as script interpreters. Well, there's still the overriding design nag that we want to avoid shell interpretation if at all possible, but that's a _much_ bigger rewrite for another day, so I can live with this current patch (it's not making our use of sh any worse than it already was). > > +static const char ebtables_script_func_collect_chains[] = Reading shell code that has been quoted into a C string literal to be passed through printf is an interesting exercise :) > + "collect_chains()\n" Making sure I understand the rest of this patch, this function is reached in two places by the rest of your patch, with code like: + virBufferAsprintf(buf, "$(collect_chains %s) ", rootchain); thus, you are using one command substitution per rootchain, where this function then outputs words to be parsed as part of a resulting command. That's a lot of processes; would it be worth using a shell for loop instead of a command-substitution per rootchain > + "{\n" > + " local sc\n" 'local' is not portable; POSIX doesn't guarantee it. I suppose we can get away with it since both bash and dash support it, and this code is specific to Linux where we know /bin/sh will be either bash or dash, but it would be even nicer if we could stick to POSIX syntax (which means avoiding 'local' and treating all variables are global, and you have to be careful that variables used inside a function are not likely to cause conflicts with any other variables used globally). > + " sc=$(%s -t %s -L $1 | \\\n" Just so I'm making sure I understand things, this line is paired with ebtables_cmd_path and EBTABLES_DEFAULT_TABLE, which results in something like this in shell ($1 being a rootchain name, fitting the pattern libvirt-%c-%s, so I think you're okay that $1 wasn't quoted): sc=$(/path/to/iptables -t nat -L $1 | ... Can you please include a comment mentioning the typical output from such a command that you intend to be parsing? Technically, you don't need backslash-newline after pipe; a trailing pipe is sufficient to continue a statement to the next line in shell. But it doesn't hurt to leave it in either. > + " sed -n \"/Bridge chain*/,$ s/.*\\-j \\([%s]-.*\\)/\\1/p\")\n" 1 2 3 4 5 1. Did you really mean to match lines with "Bridge chai" or "Bridge chainnn"? That * is probably superfluous. 2. "$ " is not portable shell. To be portable, it should be "\$ " or '$ '. 3. \- is not a portable sed escape. But you don't need escaping for '-', since searching for a literal - in a sed expression works just fine. 4. This %s matches up to the 'chains' argument in this code: + virBufferAsprintf(buf, NWFILTER_FUNC_COLLECT_CHAINS, + ebtables_cmd_path, EBTABLES_DEFAULT_TABLE, chains); which I in turn traced back to: + char chains[3] = { + CHAINPREFIX_HOST_IN_TEMP, // 'J' + CHAINPREFIX_HOST_OUT_TEMP, // 'P' + 0}; so it looks like you are taking a line that looks like: Bridge chain ... -j J-... and turning it into: J-... while ignoring all other lines. 5. Use of "\(\)/\1" in shell is unusual (it's well-defined by POSIX that the backslash is preserved literally if the next character is not backquote, dollar, backslash, or newline), but for safety reason, I prefer to write it as "\\(\\)/\\1" to make it obvious that in the shell code I meant a literal backslash rather than relying on POSIX rules. Put all of these together, and this line should probably be: " sed -n \"/Bridge chain/,\\$ s/.*-j \\\\([%s]-.*\\\\)/\\\\1/p\")\n" > + " for tmp in `echo \"$sc\"`; do\n" I prefer $() over ``. But this echo is a wasted process - no need to echo when you can just use $sc directly: " for tmp in $sc; do\n" > + " [ -n \"$tmp\" ] && sc=\"$sc $(collect_chains $tmp)\"\n" Odd to be assigning to $sc in the middle of a loop based on the contents of $sc, and problematic if we change $sc to be global instead of local. Furthermore, [ -n $tmp ] will always be true (whether with your 'for tmp in `echo "$sc"`' or my 'for tmp in $sc', the point remains that word splitting is done, so each value of $tmp will be non-empty and contain no spaces). > + " done\n" > + " echo \"$sc\"\n" > + "}\n"; Ah, so the end result is that you echo each name found, as well as recurse on each name found to echo any dependent names. Is order important (all names from the first chain must be output before any names from the recursive calls)? If not, then I can rewrite this function to avoid local sc in the first place. Here, in shell form (I'll leave you to convert it back into C string literal form): collect_chains() { for tmp in $(iptables -t nat -L $1 | \ sed -n "/Bridge chain/,\$ s/.*-j \\([JP]-.*\\)/\\1/p"); do echo $tmp collect_chains $tmp done } > + > +static const char ebiptables_script_func_rm_chains[] = > + "rm_chains()\n" It looks like you are calling rm_chains() with a single argument containing a whitespace separated list of arguments. + virBufferAddLit(buf, "rm_chains \"$a\"\n"); Why not just call rm_chains with multiple arguments? virBufferAddLit(buf, "rm_chains $a\n"); > + "{\n" > + " for tmp in `echo \"$1\"`; do [ -n \"$tmp\" ] && %s -t %s -F $tmp; done\n" > + " for tmp in `echo \"$1\"`; do [ -n \"$tmp\" ] && %s -t %s -X $tmp; done\n" > + "}\n"; Is it essential that all chains be flushed before any are deleted? If so, then keeping this as two for loops makes sense; if not, then these could be combined. Same story about `echo "$1"` being identical to the much simpler $1. Same story about $tmp always being non-empty in a shell for-loop where the tokens were created by word-splitting. With one argument containing a quoted list, I'd write this as: rm_chains() { for tmp in $1; do %s -t %s -F $tmp; done for tmp in $1; do %s -t %s -X $tmp; done } with multiple arguments (unquoted $a), I'd write it as: rm_chains() { for tmp in $*; do %s -t %s -F $tmp; done for tmp in $*; do %s -t %s -X $tmp; done } <sidenote> A point that might be worth a separate cleanup patch: it looks awkward to have to write the literal name of the script and the table name in multiple places into the script: + virBufferAsprintf(buf, NWFILTER_FUNC_DELETE_CHAINS, + ebtables_cmd_path, EBTABLES_DEFAULT_TABLE, + ebtables_cmd_path, EBTABLES_DEFAULT_TABLE); generating: rm_chains() { for tmp in $*; do /path/to/iptables -t nat -f $tmp; done... .. rm_chains $a Why not change your shell script to set up a shell variable up front for ebtables_cmd_path and EBTABLES_DEFAULT_TABLE, and use that as a shell variable instead of as a printf substitution? That is, have your overall script look like: TABLESCMD=/path/to/iptables TABLE=nat rm_chains() { for tmp in $*; do $TABLESCMD -t $TABLE -f $tmp; done ... rm_chains $a where the script was generated by: virBufferAsprintf(buf, "TABLESCMD=%s\nTABLES=%s\n", ebtables_cmd_path, EBTABLES_DEFAULT_TABLE); virBufferAsprintf(buf, NWFILTER_FUNC_DELETE_CHAINS); ... virBufferAddLit(buf, "rm_chains $a\n"); </sidenote> > + > +static const char ebiptables_script_func_rename_chains[] = > + "rename_chains()\n" > + "{\n" > + " for tmp in `echo \"$1\"`; do\n" Again, `echo "$1"` is overkill for $1, > + " if [ -n \"$tmp\" ]; then\n" and testing for non-empty $tmp is wasted effort, if the loop is over a word-split $1. > + " tmp2=`expr substr \"$tmp\" 1 1`\n" expr substr is not portable to POSIX. But you don't need it, when a shell case statement, followed by a POSIX variable expansion, will do the same. > + " if [ $tmp2 = \"%c\" ]; then\n" > + " %s -t %s -E \"$tmp\" \"%c\"`expr substr \"$tmp\" 2 33`\n" > + " elif [ $tmp2 = \"%c\" ]; then\n" > + " %s -t %s -E \"$tmp\" \"%c\"`expr substr \"$tmp\" 2 33`\n" > + " fi\n" > + " fi\n" > + " done\n" > + "}\n"; I'd write this as: rename_chains() { for tmp in $1; do case $tmp in J*) /path/to/iptables -t nat -E $tmp I${tmp#?} ;; P*) /path/to/iptables -t nat -E $tmp O${tmp#?} ;; esac done } or back in your C-string format: "rename_chains()\n" "{\n" " for tmp in $1; do\n" " case $tmp in\n" " %c*) %s -t %s -E $tmp %c${tmp#?} ;;\n" " %c*) %s -t %s -E $tmp %c${tmp#?} ;;\n" " esac\n" " done\n" "}\n" > + > +static const char ebiptables_script_set_ifs[] = > + "IFS=$' \\t\\n'\n" > + "[ `expr length \"$IFS\"` != 3 ] && " > + "IFS=$(echo | %s '{ print \"\\t\\n \"}')\n"; Oy - my head hurts! $'' is not portable; neither is expr length. And doing IFS=$() is just wrong - that eats the trailing newline, leaving you with just IFS=space-tab instead of the intended IFS=space-tab-newline. Just do: static const char ebiptables_script_set_ifs[] = "nl='\n" "'; IFS=' ''\t'$nl\n" and you're done. No need for shenanigans with command substitution, expr, or awk. [Side note - the only reason I write ' ''\t' rather than ' \t' is that I don't like seeing space-tab in generated scripts; some editors have a tendency to corrupt space-tab in the name of being "helpful", so separating the characters avoids that problem.] > + > +#define NWFILTER_FUNC_COLLECT_CHAINS ebtables_script_func_collect_chains > +#define NWFILTER_FUNC_DELETE_CHAINS ebiptables_script_func_rm_chains This particular change in names made it a bit harder to search the rest of your patch (I had to search two different strings to see where the function definition was emitted, vs. where it was called); I'd suggest either naming the function delete_chains, or the macro NWFILTER_FUNC_RM_CHAINS, for consistency. > +#define NWFILTER_FUNC_RENAME_CHAINS ebiptables_script_func_rename_chains > +#define NWFILTER_FUNC_SET_IFS ebiptables_script_set_ifs > > #define VIRT_IN_CHAIN "libvirt-in" > #define VIRT_OUT_CHAIN "libvirt-out" > @@ -2805,95 +2848,65 @@ ebtablesCreateTmpSubChain(virBufferPtr b > return 0; > } Phew - made it past the function definitions. Now, on to how they are used. [By the way, in case it's not obvious with my lengthy review above, I _like_ the fact that you are using shell functions to consolidate effort - as long as we are using a scripting language, we might as well use all of its power - and it does mean that any future attempt to refactor this away from a temporary shell script into libvirt primitives will have to keep that in mind] > > - > -static int > -_ebtablesRemoveSubChain(virBufferPtr buf, > - int incoming, > - const char *ifname, > - enum l3_proto_idx protoidx, > - int isTempChain) > +static int _ebtablesRemoveSubChains(virBufferPtr buf, > + const char *ifname, > + const char *chains) > { > - char rootchain[MAX_CHAINNAME_LENGTH], chain[MAX_CHAINNAME_LENGTH]; > - char chainPrefix; > - > - if (isTempChain) { > - chainPrefix =(incoming) ? CHAINPREFIX_HOST_IN_TEMP > - : CHAINPREFIX_HOST_OUT_TEMP; > - } else { > - chainPrefix =(incoming) ? CHAINPREFIX_HOST_IN > - : CHAINPREFIX_HOST_OUT; > - } > + char rootchain[MAX_CHAINNAME_LENGTH]; > + unsigned i; > > - PRINT_ROOT_CHAIN(rootchain, chainPrefix, ifname); > - PRINT_CHAIN(chain, chainPrefix, ifname, l3_protocols[protoidx].val); > - > - virBufferAsprintf(buf, > - "%s -t %s -D %s -p 0x%x -j %s" CMD_SEPARATOR > - "%s -t %s -F %s" CMD_SEPARATOR > - "%s -t %s -X %s" CMD_SEPARATOR, > + virBufferAsprintf(buf, NWFILTER_FUNC_COLLECT_CHAINS, > + ebtables_cmd_path, EBTABLES_DEFAULT_TABLE, chains); > + virBufferAsprintf(buf, NWFILTER_FUNC_DELETE_CHAINS, > ebtables_cmd_path, EBTABLES_DEFAULT_TABLE, > - rootchain, l3_protocols[protoidx].attr, chain, > + ebtables_cmd_path, EBTABLES_DEFAULT_TABLE); > > - ebtables_cmd_path, EBTABLES_DEFAULT_TABLE, chain, > + virBufferAsprintf(buf, NWFILTER_FUNC_SET_IFS, gawk_cmd_path); Given my above reformulation of SET_IFS, you don't need gawk_cmd_path here. > + virBufferAddLit(buf, "a=\""); > + for (i = 0; chains[i] != 0; i++) { > + PRINT_ROOT_CHAIN(rootchain, chains[i], ifname); > + virBufferAsprintf(buf, "$(collect_chains %s) ", rootchain); > + } > + virBufferAddLit(buf, "\"\n"); This results in the shell code: a="$(collect_chains name) $(collect_chains name) " If you added looping in the shell definition of collect_chains, this could instead be: a=$(collect_chains name name) Not essential, but food for thought for a smaller script. > > - ebtables_cmd_path, EBTABLES_DEFAULT_TABLE, chain); > + for (i = 0; chains[i] != 0; i++) { > + PRINT_ROOT_CHAIN(rootchain, chains[i], ifname); > + virBufferAsprintf(buf, > + "%s -t %s -F %s\n", > + ebtables_cmd_path, EBTABLES_DEFAULT_TABLE, > + rootchain); > + } > + virBufferAddLit(buf, "rm_chains \"$a\"\n"); This results in: rm_chains "$a" See my earlier comments where using $* instead of $1 would let you write this as: rm_chains $a instead. > +static int > +ebtablesRenameTmpSubAndRootChains(virBufferPtr buf, > + const char *ifname) > +{ > + char rootchain[MAX_CHAINNAME_LENGTH]; > + unsigned i; > + char chains[3] = { > + CHAINPREFIX_HOST_IN_TEMP, > + CHAINPREFIX_HOST_OUT_TEMP, > + 0}; > + > + virBufferAsprintf(buf, NWFILTER_FUNC_COLLECT_CHAINS, > + ebtables_cmd_path, EBTABLES_DEFAULT_TABLE, chains); This means you are outputting the function definition from two different call paths. It doesn't hurt to (re-)define a function to the same value, while avoiding the function if it isn't used; on the other hand, I'm wondering if it is worth outputting the functions as part of the file header rather than risking having the function definitions appear multiple times throughout the temporary script. I can live with your current approach, so don't bother refactoring things unless you want to. [I speak from experience - I'm one of the autoconf maintainers, and solving the problem of how to generate shell script output based on m4 input, where pre-requisite shell functions are emitted at most once, up front, but used multiple times through the rest of the script, all in order to reduce the total size of the overall script, proved to be an interesting challenge in writing a code generator.] > + virBufferAsprintf(buf, NWFILTER_FUNC_RENAME_CHAINS, > + CHAINPREFIX_HOST_IN_TEMP, > + ebtables_cmd_path, EBTABLES_DEFAULT_TABLE, > + CHAINPREFIX_HOST_IN, > + CHAINPREFIX_HOST_OUT_TEMP, > + ebtables_cmd_path, EBTABLES_DEFAULT_TABLE, > + CHAINPREFIX_HOST_OUT); > + > + virBufferAsprintf(buf, NWFILTER_FUNC_SET_IFS, gawk_cmd_path); > + virBufferAddLit(buf, "a=\""); > + for (i = 0; chains[i] != 0; i++) { > + PRINT_ROOT_CHAIN(rootchain, chains[i], ifname); > + virBufferAsprintf(buf, "$(collect_chains %s) ", rootchain); > } Similar comments about collect_chains usage possibly being easier to understand if it includes a shell for loop over multiple arguments. Looking forward to seeing what v5 looks like! -- Eric Blake eblake@xxxxxxxxxx +1-919-301-3266 Libvirt virtualization library http://libvirt.org
Attachment:
signature.asc
Description: OpenPGP digital signature
-- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list