On Sat, Jun 4, 2016 at 2:50 AM, Greg Navis <contact@xxxxxxxxxxxxx> wrote: > Thanks for your replies. > > Sorry for confusion. Instead of `similarity(lhs, rhs) >= show_limit()`, > which of course is completely equivalent to `lhs % rhs`, I wanted to write > `similarity(lhs, rhs) >= my_custom_threshold`. It seems that the approach > with ternary operators is quite a bit of work. I might have a simpler idea: The problem is that you would have to build your index on the expression (similarity(lhs,rhs)), and the then one of the lhs or rhs would have to be a column, and the other would have to be a constant string at the time you define the index, which is the same constant string as specified when you want to use the index. This could work as long as there are only a few strings you ever care about checking similarity to, and you would build an index for each one. These would be ordinary btree indexes, not gin indexes. I've done this before, only using substring rather than similarity. > pg_trgm also provides `<->` but it seems this operator doesn't use indexes > either. It seems the shortest path to per-query thresholds, without > compromising the design, is making this operator use the index. To be efficient, the index has to know where to start and stop its scan. Such a thing doesn't exists for "a<->b", there is no stopping point. It would have to be "a <-> b < c", at which point you are back to ternary again. > Please help > me understand whether my reasoning is correct. If it is, I'd appreciate a > high-level overview of what needs to be done. I can block a few hours to > work on this in the upcoming weeks. I think that you are proposing rewriting the indexing architecture from scratch. Doing that in a way that is robust and tested and backward compatible would probably take years, not hours, for a single individual who is not already extensively experienced in the internals of the code. And, it is not really clear even what you want to rewrite it into. Tom's proposal is to do it in a way that works with the existing architecture, rather than trying to re-write that architecture. (Which could probably not be done in a few hours, either, especially not once you write documentation and upgrade scripts and all the other stuff that goes into production-quality code.) I don't know if this would even be appropriate as an addition to pg_trgm. We might want to fork that code instead. That would be a shame, because the underlying c code would be the fundamentally the same, but the alternative would be to force people who like % and set_limit() to carry around the baggage of new operators and types they have no interest in using, and vice versa. True, we did just add several new functions and operators to pg_trgm that many people will have no interest in, so maybe that is not a big deal. Cheers, Jeff -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general