Re: [PATCH 1/6] Optimize the elements the iterator visits.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 12/09/2011 08:07 AM, Stefan Berger wrote:
> In this patch we introduce testing whether the iterator points to a
> unique set of entries that have not been seen before at one of the previous
> iterations. The point is to eliminate duplicates and with that unnecessary
> filtering rules by preventing identical filtering rules from being
> instantiated.
> Example with two lists:
> 
> list1 = [1,2,1]
> list2 = [1,3,1]

Is this something that happens frequently in practice due to user input
(that is, does the user really output list elements more than once), or
something that happens more because of combinatorial combinations as you
expand user inputs into something more precise?  I guess I'd like to
know why such repetitions are likely to occur, although I can agree with
doing less firewall rules if we can determine that the rules are duplicates.

> 
> The 1st iteration would take the 1st items of each list -> 1,1
> The 2nd iteration would take the 2nd items of each list -> 2,3
> The 3rd iteration would take the 3rd items of each list -> 1,1 but
> skip them since this same pair has already been encountered in the 1st
> iteration
> 
> Implementation-wise this is solved by taking the n-th element of list1 and
> comparing it against elements 1..n-1. If no equivalent is found, then there
> is no possibility of this being a duplicate. In case an equivalent element
> is found at position i, then the n-th element in the 2nd list is compared
> against the i-th element in the 2nd list and if that is not the same, then
> this is a unique pair, otherwise it is not unique and we may need to do
> the same comparison on the 3rd list.
> 
> 
> ---
>  src/conf/nwfilter_params.c |   55 +++++++++++++++++++++++++++++++++++++++++++--
>  1 file changed, 53 insertions(+), 2 deletions(-)
> 
> Index: libvirt-iterator/src/conf/nwfilter_params.c
> ===================================================================
> --- libvirt-iterator.orig/src/conf/nwfilter_params.c
> +++ libvirt-iterator/src/conf/nwfilter_params.c
> @@ -348,6 +348,52 @@ virNWFilterVarCombIterAddVariable(virNWF
>  }
>  
>  /*
> + * Test whether the iterator entry points to a distinguished set of entries
> + * that have not been seen before at one of the previous iterations.
> + *
> + * The point of this function is to eliminate duplicates.
> + * Example with two lists:
> + *
> + * list1 = [1,2,1]
> + * list2 = [1,3,1]
> + *
> + * The 1st iteration would take the 1st items of each list -> 1,1
> + * The 2nd iteration would take the 2nd items of each list -> 2,3
> + * The 3rd iteration would take the 3rd items of each list -> 1,1 but
> + * skip them since this pair has already been encountered in the 1st iteration
> + */
> +static bool
> +virNWFilterVarCombIterEntryAreUniqueEntries(virNWFilterVarCombIterEntryPtr cie,
> +                                            virNWFilterHashTablePtr hash)
> +{
> +    unsigned int i, j;
> +    virNWFilterVarValuePtr varValue, tmp;
> +    const char *value;
> +
> +    varValue = virHashLookup(hash->hashTable, cie->varNames[0]);
> +
> +    value = virNWFilterVarValueGetNthValue(varValue, cie->curValue);
> +
> +    for (i = 0; i < cie->curValue; i++) {
> +        if (STREQ(value, virNWFilterVarValueGetNthValue(varValue, i))) {
> +            bool isSame = true;

This function is called in a loop over the list, and itself does a loop
over the list, making it a quadratic check for duplicates.  Am I correct
that lists can provided by the user with arbitrary lengths?

> +            for (j = 1; j < cie->nVarNames; j++) {

Furthermore, we add another layer of looping (this time over the number
of parallel lists being combined), for an overall cubic algorithm.  Is
the number of parallel lists that can be provided by the user arbitrary?

That is, I'm wondering if we need to start worrying about reducing
complexity, and coming up with an algorithm that isn't quite so brute
force so as not to take quite so long on arbitrary-length inputs.

On the other hand, until someone actually demonstrates slow input, I'm
okay with making assumptions that most users won't be merging large
amounts of long parallel lists, and that the overhead of reducing
algorithmic complexity may cost more overhead than all expected use
cases just being brute-forced.  So I'm okay checking this in as-is,
until (and unless) we have performance numbers proving that we need to
be smarter.

> @@ -414,10 +460,15 @@ virNWFilterVarCombIterNext(virNWFilterVa
>      unsigned int i;
>  
>      for (i = 0; i < ci->nIter; i++) {
> +next:
>          ci->iter[i].curValue++;
> -        if (ci->iter[i].curValue <= ci->iter[i].maxValue)
> +        if (ci->iter[i].curValue <= ci->iter[i].maxValue) {
> +            if (!virNWFilterVarCombIterEntryAreUniqueEntries(
> +                                              &ci->iter[i], ci->hashTable)) {
> +                goto next;

Why do we need a backwards goto, when a 'continue' would get to the same
location?

> +            }
>              break;
> -        else
> +        } else
>              ci->iter[i].curValue = 0;
>      }

Coding style - the else needs {} since you added {} for the if.

I think this is okay, but wait until I review the rest of the series to
see how it was used in context.

-- 
Eric Blake   eblake@xxxxxxxxxx    +1-919-301-3266
Libvirt virtualization library http://libvirt.org

Attachment: signature.asc
Description: OpenPGP digital signature

--
libvir-list mailing list
libvir-list@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/libvir-list

[Index of Archives]     [Virt Tools]     [Libvirt Users]     [Lib OS Info]     [Fedora Users]     [Fedora Desktop]     [Fedora SELinux]     [Big List of Linux Books]     [Yosemite News]     [KDE Users]     [Fedora Tools]