Re: [389-devel] RFC: New Design: Fine Grained ID List Size

Rich Megginson <rmeggins@xxxxxxxxxx> · Thu, 12 Sep 2013 08:40:26 -0600



    On 09/12/2013 07:39 AM, thierry bordaz
      wrote:

    
      On 09/10/2013 04:35 PM, Ludwig
        Krispenz wrote:

      
        On 09/10/2013 04:29 PM, Rich Megginson wrote: 

        On 09/10/2013 01:47 AM, Ludwig Krispenz
          wrote: 

           
            On 09/09/2013 07:19 PM, Rich Megginson wrote: 

            On 09/09/2013 02:27 AM, Ludwig
              Krispenz wrote: 

               
                On 09/07/2013 05:02 AM, David Boreham wrote: 

                On 9/6/2013 8:49 PM, Nathan
                  Kinder wrote: 

                  This is a good idea, and it is
                    something that we discussed briefly off-list.  The
                    only downside is that we need to change the index
                    format to keep a count of ids for each key. 
                    Implementing this isn't a big problem, but it does
                    mean that the existing indexes need to be updated to
                    populate the count based off of the contents (as you
                    mention above). 

                  
                  I don't think you need to do this (I certainly wasn't
                  advocating doing so). The "statistics" state is much
                  the same as that proposed in Rich's design. In fact
                  you could probably just use that same information. My
                  idea is more about where and how you use the
                  information. All you need is something associated with
                  each index that says "not much point looking here if
                  you're after something specific, move along, look
                  somewhere else instead". This is much the same
                  information as "don't use a high scan limit here". 

                  
                    In the short term, we are looking for a way to be
                    able to improve performance for specific search
                    filters that are not possible to modify on the
                    client side (for whatever reason) while leaving the
                    index file format exactly as it is.  I still feel
                    that there is potentially great value in keeping a
                    count of ids per key so we can optimize things on
                    the server side automatically without the need for
                    complex index configuration on the administrator's
                    part. I think we should consider this for an
                    additional future enhancement. 

                  
                  I'm saying the same thing. Keeping a cardinality count
                  per key is way more than I'm proposing, and I'm not
                  sure how useful that would be anyway, unless you want
                  to do OLAP in the DS ;) 

                
                we have the cardinality of the key in old-idl and this
                makes some searches where parts of the filter are allids
                fast. 

                
                I'm late in the discussion, but I think Rich's proposal
                is very promising to address all the problems related to
                allids in new-idl. 

                
                We could then eventually rework filter ordering based on
                these configurations. Right now we only have a filter
                ordering based on index type and try to postpone "<="
                or similar filter as they are known to be costly, but
                this could be more elaborate. 

                
                An alternative would be to have some kind of index
                lookup caching. In the example in ticket 47474 the
                filter is
(&(|(objectClass=organizationalPerson)(objectClass=inetOrgPerson)(objectClass=organization)(objectClass=organizationalUnit)(objectClass=groupOf

                Names)(objectClass=groupOfUniqueNames)(objectClass=group))(c3sUserID=EndUser0000078458))"

                and probably only the "c3sUserID=xxxxx" part will
                change, if we cache the result for the
                (&(|(objectClass=... part, even if it is expensive,
                it would be done only once. 

              
              Thanks everyone for the comments.  I have added Noriko's
              suggestion: 

              http://port389.org/wiki/Design/Fine_Grained_ID_List_Size
              

              David, Ludwig: Does the current design address your
              concerns, and/or provide the necessary first step for
              further refinements? 

            
            yes, the topic of filter reordering or caching could be
            looked at independently. 

            
            Just one concern abou the syntax: 

            
            nsIndexIDListScanLimit:
            maxsize[:indextype][:flag[,flag...]][:value[,value...]] 

            
            since everything is optional, how do you decide if in
            nsIndexIDListScanLimit: 6:eq:AND "AND" is a value or a flag
            ? 

            and as it defines limits for specific keys, could the
            attributname reflect this, eg nsIndexKeyIDListScanLimit or
            nsIndexKeyScanLimit or ... ? 

          
          Thanks, yes, it is ambiguous. 

          I think it may have to use keyword=value, so something like
          this: 

          
          nsIndexIDListScanLimit: limit=NNN [type=eq[,sub]]
          [flags=ADD[,OR]] [values=val[,val...]] 

          
          That should be easy to parse for both humans and machines. 

          For values, will have to figure out a way to have escapes
          (e.g. if a value contains a comma or an escape character).  
          Was thinking of using LDAP escapes (e.g. \, or \032) 

        
        they should be treated as in filters and normalized, in the
        config it should be the string representation according to the
        attributetype 

      
      Hi,

      
      I was wondering if this configuration attribute at the
        index level, could not also be implemented at the bind-base
        level.

      
    It could be - it would be more difficult to do - you would have to
    have the nsIndexIDListScanLimit attribute specified in the user
    entry, and it would have to specify the attribute type e.g. 

    
    dn: uid=admin,....

    nsIndexIDListScanLimit: limit=xxxx attr=objectclass type=eq
    value=inetOrgPerson

    
    Or perhaps a new attribute - nsIndexIDListScanLimit should be not
    operational for use in nsIndex, but should be operational for use in
    a user entry.

    
       If an application use to bind with a given entry, it
        could use its own limitations put for example into operational
        attribute in the bound entry itself.

      
    Yes, and we already do this for other limits.

    
       So that two applications, using the same filter
        component could have their specific idlist size.

        Anyway if it makes sense it could be added later.

      
    Yes, thanks.

    
      best regards

      thierry

      
                  -- 

                  389-devel mailing list 

                  389-devel@xxxxxxxxxxxxxxxxxxxxxxx
                  

                  https://admin.fedoraproject.org/mailman/listinfo/389-devel
                  

                -- 

                389-devel mailing list 

                389-devel@xxxxxxxxxxxxxxxxxxxxxxx
                

                https://admin.fedoraproject.org/mailman/listinfo/389-devel
                

        -- 

        389-devel mailing list 

        389-devel@xxxxxxxxxxxxxxxxxxxxxxx
        

        https://admin.fedoraproject.org/mailman/listinfo/389-devel

      
--
389-devel mailing list
389-devel@xxxxxxxxxxxxxxxxxxxxxxx
https://admin.fedoraproject.org/mailman/listinfo/389-devel