2016-06-11 13:47 GMT+03:00 Greg Navis <contact@xxxxxxxxxxxxx>:
I made some progress but I'm stuck. I'm focused on GiST for now. Please ignore sloppy naming for now.I made the following changes to pg_trgm--1.2.sql:CREATE TYPE pg_trgm_match AS (match TEXT, threshold REAL);CREATE OR REPLACE FUNCTION trgm_check_match(string TEXT, match pg_trgm_match) RETURNS bool AS $$BEGINRETURN match.match <-> string <= 1 - match.threshold;END;$$ LANGUAGE plpgsql;CREATE OPERATOR %%(leftarg = text, rightarg = pg_trgm_match, procedure=trgm_check_match);ALTER OPERATOR FAMILY gist_trgm_ops USING gist ADDOPERATOR 9 %% (text, pg_trgm_match);
You can overload existing % operator:
ALTER OPERATOR FAMILY gist_trgm_ops USING gist ADD
OPERATOR 9 % (text, pg_trgm_match);
It does indeed make PostgreSQL complain about undefined strategy 9. I added the following define to trgm.h:#define ThresholdStrategyNumber 9It seems StrategyNumber is used in gtrgm_consistent and gtrgm_distance.In gtrgm_consistent, I need change the way `nlimit` is obtained:nlimit = (strategy == SimilarityStrategyNumber) ?similarity_threshold : word_similarity_threshold;I need to add a case for ThresholdStrategyNumber and extract `nlimit` from the argument of `pg_trgm_match`. I'm not sure what to do in `gtrgm_distance`.My questions:1a. Is it possible to make `gtrgm_consistent` accept `text` or `pg_trgm_match` as the second argument?
I think you can change definition of the gtrgm_consistent() in .sql file in CREATE FUNCTION and CREATE OPERATOR CLASS commands to:
gtrgm_consistent(internal,anynonarray,smallint,oid,internal)
But I do not sure that anynonarray is good here.
1b. What's the equivalent of `match.match` and `match.threshold` (where `match` is a `pg_trgm_match`) in C?
After changing the definition you can extract values from composite type in the gtrgm_consistent(). I think the code in the beginning of function may looks like this:
if (strategy == SimilarityStrategyNumber ||
strategy == WordSimilarityStrategyNumber)
{
query = PG_GETARG_TEXT_P(1);
nlimit = (strategy == SimilarityStrategyNumber) ?
similarity_threshold : word_similarity_threshold;
}
else if (strategy == ThresholdStrategyNumber)
{
HeapTupleHeader query_match = PG_GETARG_HEAPTUPLEHEADER(1);
Oid tupType = HeapTupleHeaderGetTypeId(query_match);
int32 tupTypmod = HeapTupleHeaderGetTypMod(query_match);
TupleDesc tupdesc = lookup_rowtype_tupdesc(tupType, tupTypmod);
HeapTupleData tuple;
bool isnull;
tuple.t_len = HeapTupleHeaderGetDatumLength(query_match);
ItemPointerSetInvalid(&(tuple.t_self));
tuple.t_tableOid = InvalidOid;
tuple.t_data = query_match;
query = DatumGetTextP(fastgetattr(&tuple, 1, tupdesc, &isnull));
nlimit = DatumGetFloat4(fastgetattr(&tuple, 2, tupdesc, &isnull));
ReleaseTupleDesc(tupdesc);
}
else
query = PG_GETARG_TEXT_P(1);
After this code you should execute the query using index:
select t,similarity(t,'qwertyu0988') as sml from test_trgm where t % row('qwertyu0988', 0.6)::pg_trgm_match;
I got the query from the regression test. And of course the code need to be checked for bugs.
2. What to do with `gtrgm_distance`?
You do not need to change gtrgm_distance(). It is used only in ORDER BY clause to calculate distances. To calculate distance you do not need threshold.
Thanks for help.--Greg NavisI help tech companies to scale Heroku-hosted Rails apps.