On 09/06/2013 05:30 PM, David Boreham wrote:
On 9/6/2013 3:05 PM, Rich Megginson wrote:
Please review and comment:
http://port389.org/wiki/Design/Fine_Grained_ID_List_Size
This looks interesting. I suppose this is similar to a SQL database's
concept of index statistics, and also query hints supplied by the
client. Perhaps more of a "server index hint".
This may already been discussed, but in reading through the design
doc, I was wondering about having the query planner (such as there is
one in the DS) take note of the index hints prior to executing any
lookups. This is similar to a SQL database's behavior executing such a
query, when it sees index statistics that indicate low cardinality. To
expand:
I seem to recall that there is already code to avoid looking up a low
cardinality index, if and only if the intersection predicates are
ordered suitably, by checking the id list size between index lookups.
Thus, if there is a filter for uid=foo & objectclass=bar (apologies
for not using the wacky LDAP string filter syntax...), then if only
one hit is seen from the uid=foo lookup, the objectclass=bar lookup is
skipped. If that's still the case, then the example "bad" search would
become good if the client were to re-order the predicates. Of course
often the client can not be modified, so:
My thought is to add that functionality to the server -- the client
can then submit filters without regard to the internal workings of the
server. The server checks the predicates against Rich's new index
hints, and can therefore make the correct ordering itself. The benefit
would be that no additional index lookup would be done, vs one that
meets the id limit pertaining to the index, and the administrator only
has to know that the index has low cardinality. A further "refinement"
would be to make a tool that populates the hint data based on analysis
of the index content, a la SQL "UPDATE STATISTICS".
Thanks for your feedback David!
This is a good idea, and it is something that we discussed briefly
off-list. The only downside is that we need to change the index format
to keep a count of ids for each key. Implementing this isn't a big
problem, but it does mean that the existing indexes need to be updated
to populate the count based off of the contents (as you mention above).
In the short term, we are looking for a way to be able to improve
performance for specific search filters that are not possible to modify
on the client side (for whatever reason) while leaving the index file
format exactly as it is. I still feel that there is potentially great
value in keeping a count of ids per key so we can optimize things on the
server side automatically without the need for complex index
configuration on the administrator's part. I think we should consider
this for an additional future enhancement.
Thanks,
-NGK
Hopefully this makes sense. Apologies if it has already been considered.
--
389-devel mailing list
389-devel@xxxxxxxxxxxxxxxxxxxxxxx
https://admin.fedoraproject.org/mailman/listinfo/389-devel
--
389-devel mailing list
389-devel@xxxxxxxxxxxxxxxxxxxxxxx
https://admin.fedoraproject.org/mailman/listinfo/389-devel