Re: [PATCH 1/6] Optimize the elements the iterator visits.

Stefan Berger <stefanb@xxxxxxxxxxxxxxxxxx> · Tue, 10 Jan 2012 10:02:01 -0500




On 01/09/2012 07:05 PM, Eric Blake wrote:
On 12/09/2011 08:07 AM, Stefan Berger wrote:
In this patch we introduce testing whether the iterator points to a
unique set of entries that have not been seen before at one of the previous
iterations. The point is to eliminate duplicates and with that unnecessary
filtering rules by preventing identical filtering rules from being
instantiated.
Example with two lists:

list1 = [1,2,1]
list2 = [1,3,1]
Is this something that happens frequently in practice due to user input
(that is, does the user really output list elements more than once), or
something that happens more because of combinatorial combinations as you
expand user inputs into something more precise?  I guess I'd like to
know why such repetitions are likely to occur, although I can agree with
doing less firewall rules if we can determine that the rules are duplicates.

Fair question. It's an optimization that I would admit is not absolutely 
necessary, but reduces the number of rules that are being generated in 
case duplicates would occur. Doing this calculation upfront rather than 
having duplicate rules evaluated on possibly thousands of packets seems 
worth the effort. The algorithm is rather short but maybe not as 
efficient as it should be.

  /*
+ * Test whether the iterator entry points to a distinguished set of entries
+ * that have not been seen before at one of the previous iterations.
+ *
+ * The point of this function is to eliminate duplicates.
+ * Example with two lists:
+ *
+ * list1 = [1,2,1]
+ * list2 = [1,3,1]
+ *
+ * The 1st iteration would take the 1st items of each list ->  1,1
+ * The 2nd iteration would take the 2nd items of each list ->  2,3
+ * The 3rd iteration would take the 3rd items of each list ->  1,1 but
+ * skip them since this pair has already been encountered in the 1st iteration
+ */
+static bool
+virNWFilterVarCombIterEntryAreUniqueEntries(virNWFilterVarCombIterEntryPtr cie,
+                                            virNWFilterHashTablePtr hash)
+{
+    unsigned int i, j;
+    virNWFilterVarValuePtr varValue, tmp;
+    const char *value;
+
+    varValue = virHashLookup(hash->hashTable, cie->varNames[0]);
+
+    value = virNWFilterVarValueGetNthValue(varValue, cie->curValue);
+
+    for (i = 0; i<  cie->curValue; i++) {
+        if (STREQ(value, virNWFilterVarValueGetNthValue(varValue, i))) {
+            bool isSame = true;
This function is called in a loop over the list, and itself does a loop
over the list, making it a quadratic check for duplicates.  Am I correct
that lists can provided by the user with arbitrary lengths?

Each variable can have an number of elements >= 1, for example 
A=[10,20,30,20].

+            for (j = 1; j<  cie->nVarNames; j++) {
Furthermore, we add another layer of looping (this time over the number
of parallel lists being combined), for an overall cubic algorithm.  Is
the number of parallel lists that can be provided by the user arbitrary?

The number of lists depends on the number of variables that are provided 
in one rule. So, $A, $B, and $C in one rule in different fields like 
this one below has 3 different lists (where the iterators ('@<n>') only 
change the behaviour for how the instantiated rules are generated):

<rule action='accept' direction='out'>
<tcp srcipaddr='$A[@1]' srcportstart='$B[@2]' dstportstart='$C[@3]'
           dscp='4'/>
</rule>

That is, I'm wondering if we need to start worrying about reducing
complexity, and coming up with an algorithm that isn't quite so brute
force so as not to take quite so long on arbitrary-length inputs.

In case of no duplicates in $A, we have a complexity of O(n) [optimal 
case] for checking of a single rule (only 'vertical' checking, no 
'horizontal' checking)

In case $A has duplicates the complexity then becomes O(n^2) in the 
worst case since we do horizonal checking accross $B and $C as well.

I think the elements in $A for example would have to be sorted for 
faster detection of duplicates. A = [10,20,30,20] -> A = [10,20,20,30]. 
Now if I step on A[2], I would see that A[2-1] = A[2] and wouldn't have 
to look at A[0] at all. So that would be O[1]. But sorting the elements 
may alter their meaning in terms of the sequential processing of the 
rules. Got to think about this...

On the other hand, until someone actually demonstrates slow input, I'm
okay with making assumptions that most users won't be merging large
amounts of long parallel lists, and that the overhead of reducing
algorithmic complexity may cost more overhead than all expected use
cases just being brute-forced.  So I'm okay checking this in as-is,
until (and unless) we have performance numbers proving that we need to
be smarter.

OK.
@@ -414,10 +460,15 @@ virNWFilterVarCombIterNext(virNWFilterVa
      unsigned int i;

      for (i = 0; i<  ci->nIter; i++) {
+next:
          ci->iter[i].curValue++;
-        if (ci->iter[i].curValue<= ci->iter[i].maxValue)
+        if (ci->iter[i].curValue<= ci->iter[i].maxValue) {
+            if (!virNWFilterVarCombIterEntryAreUniqueEntries(
+&ci->iter[i], ci->hashTable)) {
+                goto next;
Why do we need a backwards goto, when a 'continue' would get to the same
location?

Using the goto we don't increase 'i' but only do ci->iter[i].curValue++, 
which is what is needed.


+            }
              break;
-        else
+        } else
              ci->iter[i].curValue = 0;
      }
Coding style - the else needs {} since you added {} for the if.
Fixed.

I think this is okay, but wait until I review the rest of the series to
see how it was used in context.

I'll wait. Thanks.

   Stefan

--
libvir-list mailing list
libvir-list@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/libvir-list