Re: [389-devel] RFC: New Design: Fine Grained ID List Size

Ludwig Krispenz <lkrispen@xxxxxxxxxx> · Thu, 12 Sep 2013 20:24:32 +0200



    On 09/12/2013 04:40 PM, Rich Megginson
      wrote:

    
      On 09/12/2013 07:39 AM, thierry
        bordaz wrote:

      
        On 09/10/2013 04:35 PM, Ludwig
          Krispenz wrote:

        
          On 09/10/2013 04:29 PM, Rich Megginson wrote: 

          On 09/10/2013 01:47 AM, Ludwig
            Krispenz wrote: 

             
              On 09/09/2013 07:19 PM, Rich Megginson wrote: 

              On 09/09/2013 02:27 AM, Ludwig
                Krispenz wrote: 

                 
                  On 09/07/2013 05:02 AM, David Boreham wrote: 

                  On 9/6/2013 8:49 PM, Nathan
                    Kinder wrote: 

                    This is a good idea, and it
                      is something that we discussed briefly off-list. 
                      The only downside is that we need to change the
                      index format to keep a count of ids for each key. 
                      Implementing this isn't a big problem, but it does
                      mean that the existing indexes need to be updated
                      to populate the count based off of the contents
                      (as you mention above). 

                    
                    I don't think you need to do this (I certainly
                    wasn't advocating doing so). The "statistics" state
                    is much the same as that proposed in Rich's design.
                    In fact you could probably just use that same
                    information. My idea is more about where and how you
                    use the information. All you need is something
                    associated with each index that says "not much point
                    looking here if you're after something specific,
                    move along, look somewhere else instead". This is
                    much the same information as "don't use a high scan
                    limit here". 

                    
                      In the short term, we are looking for a way to be
                      able to improve performance for specific search
                      filters that are not possible to modify on the
                      client side (for whatever reason) while leaving
                      the index file format exactly as it is.  I still
                      feel that there is potentially great value in
                      keeping a count of ids per key so we can optimize
                      things on the server side automatically without
                      the need for complex index configuration on the
                      administrator's part. I think we should consider
                      this for an additional future enhancement. 

                    
                    I'm saying the same thing. Keeping a cardinality
                    count per key is way more than I'm proposing, and
                    I'm not sure how useful that would be anyway, unless
                    you want to do OLAP in the DS ;) 

                  
                  we have the cardinality of the key in old-idl and this
                  makes some searches where parts of the filter are
                  allids fast. 

                  
                  I'm late in the discussion, but I think Rich's
                  proposal is very promising to address all the problems
                  related to allids in new-idl. 

                  
                  We could then eventually rework filter ordering based
                  on these configurations. Right now we only have a
                  filter ordering based on index type and try to
                  postpone "<=" or similar filter as they are known
                  to be costly, but this could be more elaborate. 

                  
                  An alternative would be to have some kind of index
                  lookup caching. In the example in ticket 47474 the
                  filter is
(&(|(objectClass=organizationalPerson)(objectClass=inetOrgPerson)(objectClass=organization)(objectClass=organizationalUnit)(objectClass=groupOf

                  Names)(objectClass=groupOfUniqueNames)(objectClass=group))(c3sUserID=EndUser0000078458))"


                  and probably only the "c3sUserID=xxxxx" part will
                  change, if we cache the result for the
                  (&(|(objectClass=... part, even if it is
                  expensive, it would be done only once. 

                
                Thanks everyone for the comments.  I have added Noriko's
                suggestion: 

                http://port389.org/wiki/Design/Fine_Grained_ID_List_Size
                

                David, Ludwig: Does the current design address your
                concerns, and/or provide the necessary first step for
                further refinements? 

              
              yes, the topic of filter reordering or caching could be
              looked at independently. 

              
              Just one concern abou the syntax: 

              
              nsIndexIDListScanLimit:
              maxsize[:indextype][:flag[,flag...]][:value[,value...]] 

              
              since everything is optional, how do you decide if in
              nsIndexIDListScanLimit: 6:eq:AND "AND" is a value or a
              flag ? 

              and as it defines limits for specific keys, could the
              attributname reflect this, eg nsIndexKeyIDListScanLimit or
              nsIndexKeyScanLimit or ... ? 

            
            Thanks, yes, it is ambiguous. 

            I think it may have to use keyword=value, so something like
            this: 

            
            nsIndexIDListScanLimit: limit=NNN [type=eq[,sub]]
            [flags=ADD[,OR]] [values=val[,val...]] 

            
            That should be easy to parse for both humans and machines. 

            For values, will have to figure out a way to have escapes
            (e.g. if a value contains a comma or an escape character).  
            Was thinking of using LDAP escapes (e.g. \, or \032) 

          
          they should be treated as in filters and normalized, in the
          config it should be the string representation according to the
          attributetype 

        
        Hi,

        
        I was wondering if this configuration attribute at
          the index level, could not also be implemented at the
          bind-base level.

        
      It could be - it would be more difficult to do - you would have to
      have the nsIndexIDListScanLimit attribute specified in the user
      entry, and it would have to specify the attribute type e.g. 

      
      dn: uid=admin,....

      nsIndexIDListScanLimit: limit=xxxx attr=objectclass type=eq
      value=inetOrgPerson

      
      Or perhaps a new attribute - nsIndexIDListScanLimit should be not
      operational for use in nsIndex, but should be operational for use
      in a user entry.

    
    Or it could be handled as a policy, like password policy, have a
    default one and the possibility to assign a specific one at the bind

     
         If an application use to bind with a given entry,
          it could use its own limitations put for example into
          operational attribute in the bound entry itself.

        
      Yes, and we already do this for other limits.

      
         So that two applications, using the same filter
          component could have their specific idlist size.

          Anyway if it makes sense it could be added later.

        
      Yes, thanks.

      
        best regards

        thierry

        
                    -- 

                    389-devel mailing list 

                    389-devel@xxxxxxxxxxxxxxxxxxxxxxx
                    

                    https://admin.fedoraproject.org/mailman/listinfo/389-devel
                    

                  -- 

                  389-devel mailing list 

                  389-devel@xxxxxxxxxxxxxxxxxxxxxxx
                  

                  https://admin.fedoraproject.org/mailman/listinfo/389-devel
                  

          -- 

          389-devel mailing list 

          389-devel@xxxxxxxxxxxxxxxxxxxxxxx
          

          https://admin.fedoraproject.org/mailman/listinfo/389-devel

        
--
389-devel mailing list
389-devel@xxxxxxxxxxxxxxxxxxxxxxx
https://admin.fedoraproject.org/mailman/listinfo/389-devel