Re: [389-devel] RFC: New Design: Fine Grained ID List Size

Nathan Kinder <nkinder@xxxxxxxxxx> · Fri, 06 Sep 2013 19:49:57 -0700

On 09/06/2013 05:30 PM, David Boreham wrote:
On 9/6/2013 3:05 PM, Rich Megginson wrote:
Please review and comment:

http://port389.org/wiki/Design/Fine_Grained_ID_List_Size

This looks interesting. I suppose this is similar to a SQL database's 
concept of index statistics, and also query hints supplied by the 
client. Perhaps more of a "server index hint".

This may already been discussed, but in reading through the design 
doc, I was wondering about having the query planner (such as there is 
one in the DS) take note of the index hints prior to executing any 
lookups. This is similar to a SQL database's behavior executing such a 
query, when it sees index statistics that indicate low cardinality. To 
expand:

I seem to recall that there is already code to avoid looking up a low 
cardinality index, if and only if the intersection predicates are 
ordered suitably, by checking the id list size between index lookups. 
Thus, if there is a filter for uid=foo & objectclass=bar (apologies 
for not using the wacky LDAP string filter syntax...), then if only 
one hit is seen from the uid=foo lookup, the objectclass=bar lookup is 
skipped. If that's still the case, then the example "bad" search would 
become good if the client were to re-order the predicates. Of course 
often the client can not be modified, so:

My thought is to add that functionality to the server -- the client 
can then submit filters without regard to the internal workings of the 
server. The server checks the predicates against Rich's new index 
hints, and can therefore make the correct ordering itself. The benefit 
would be that no additional index lookup would be done, vs one that 
meets the id limit pertaining to the index, and the administrator only 
has to know that the index has low cardinality. A further "refinement" 
would be to make a tool that populates the hint data based on analysis 
of the index content, a la SQL "UPDATE STATISTICS".
Thanks for your feedback David!

This is a good idea, and it is something that we discussed briefly 
off-list.  The only downside is that we need to change the index format 
to keep a count of ids for each key.  Implementing this isn't a big 
problem, but it does mean that the existing indexes need to be updated 
to populate the count based off of the contents (as you mention above).

In the short term, we are looking for a way to be able to improve 
performance for specific search filters that are not possible to modify 
on the client side (for whatever reason) while leaving the index file 
format exactly as it is.  I still feel that there is potentially great 
value in keeping a count of ids per key so we can optimize things on the 
server side automatically without the need for complex index 
configuration on the administrator's part.  I think we should consider 
this for an additional future enhancement.

Thanks,
-NGK

Hopefully this makes sense. Apologies if it has already been considered.

--
389-devel mailing list
389-devel@xxxxxxxxxxxxxxxxxxxxxxx
https://admin.fedoraproject.org/mailman/listinfo/389-devel

--
389-devel mailing list
389-devel@xxxxxxxxxxxxxxxxxxxxxxx
https://admin.fedoraproject.org/mailman/listinfo/389-devel