Re: [389-devel] RFC: New Design: Fine Grained ID List Size

Ludwig Krispenz <lkrispen@xxxxxxxxxx> · Mon, 09 Sep 2013 10:27:42 +0200

On 09/07/2013 05:02 AM, David Boreham wrote:
On 9/6/2013 8:49 PM, Nathan Kinder wrote:
This is a good idea, and it is something that we discussed briefly 
off-list.  The only downside is that we need to change the index 
format to keep a count of ids for each key.  Implementing this isn't 
a big problem, but it does mean that the existing indexes need to be 
updated to populate the count based off of the contents (as you 
mention above).

I don't think you need to do this (I certainly wasn't advocating doing 
so). The "statistics" state is much the same as that proposed in 
Rich's design. In fact you could probably just use that same 
information. My idea is more about where and how you use the 
information. All you need is something associated with each index that 
says "not much point looking here if you're after something specific, 
move along, look somewhere else instead". This is much the same 
information as "don't use a high scan limit here".

In the short term, we are looking for a way to be able to improve 
performance for specific search filters that are not possible to 
modify on the client side (for whatever reason) while leaving the 
index file format exactly as it is.  I still feel that there is 
potentially great value in keeping a count of ids per key so we can 
optimize things on the server side automatically without the need for 
complex index configuration on the administrator's part. I think we 
should consider this for an additional future enhancement.

I'm saying the same thing. Keeping a cardinality count per key is way 
more than I'm proposing, and I'm not sure how useful that would be 
anyway, unless you want to do OLAP in the DS ;)
we have the cardinality of the key in old-idl and this makes some 
searches where parts of the filter are allids fast.

I'm late in the discussion, but I think Rich's proposal is very 
promising to address all the problems related to allids in new-idl.

We could then eventually rework filter ordering based on these 
configurations. Right now we only have a filter ordering based on index 
type and try to postpone "<=" or similar filter as they are known to be 
costly, but this could be more elaborate.

An alternative would be to have some kind of index lookup caching. In 
the example in ticket 47474 the filter is 
(&(|(objectClass=organizationalPerson)(objectClass=inetOrgPerson)(objectClass=organization)(objectClass=organizationalUnit)(objectClass=groupOf
Names)(objectClass=groupOfUniqueNames)(objectClass=group))(c3sUserID=EndUser0000078458))" 
and probably only the "c3sUserID=xxxxx" part will change, if we cache 
the result for the (&(|(objectClass=... part, even if it is expensive, 
it would be done only once.

--
389-devel mailing list
389-devel@xxxxxxxxxxxxxxxxxxxxxxx
https://admin.fedoraproject.org/mailman/listinfo/389-devel

--
389-devel mailing list
389-devel@xxxxxxxxxxxxxxxxxxxxxxx
https://admin.fedoraproject.org/mailman/listinfo/389-devel