On Wed, 19 Apr 2017 16:28:13 -0700, Rj Ewing <ewing.rj@xxxxxxxxx> wrote: >okay, messing around a bit more with the secondary k,v table it seems like >this could be a good solution.. > >I created a keys table to hold the 63 key values, then I dropped and >recreated the secondary table, using a FK referencing the keys table. I'm >not really sure why, but a basic full text query on 44 million row is >taking aproxx. 20ms. That pretty much confirms your statistics were bad ... using the FK table or not wouldn't make any difference to the planner. But if you are getting 20ms on 44M rows, then one or more of the following must be true: - your text values must be very short - your FTS queries must be very simple - you aren't reading the results For comparison: I have an application that does FTS on a table of NAICS descriptions indexed using tsvectors with an average length of 4.8 tokens per. It does a 3-part All/Any/None term search. On my 24-core 2.4GHz server, a single threaded query with the whole table and index in memory takes ~1 ms to search 20K rows using a realistic tsquery: e.g., SELECT code,description FROM naics WHERE ts_index @@ to_tsquery('packaged & software & !(wholesale)') [getting the data out of Postgresql takes longer than the search] GIN indexes don't exactly scale linearly, and tsquery is, in general, much more dependent on the lengths of the tsvectors than on the complexity of the match, but with 44M rows of similarly distributed data, a similarly realistic query would be expected to take well over 1 second. My example is genuine but too small to bother parallelizing [mentioned in a previous message]. Since you are *testing* with 1M records (that create 44M k:v shards), I am assuming you will need to deal with much more than that in deployment. And if you think you need FTS, then you must be expecting more than simple word matches [as below], else you might do something simpler like SELECT ... WHERE val ILIKE <value> >my table structure is: > > Table "public.samples_lg_txt" > Column | Type | Modifiers >--------+----------+----------- > id | integer | > key | integer | > val | text | > tsv | tsvector | >Indexes: > "idx_tsv_samples_lg_text" gin (tsv) >Foreign-key constraints: > "samples_lg_txt_id_fkey" FOREIGN KEY (id) REFERENCES samples_lg(id) > ON DELETE CASCADE > "samples_lg_txt_key_fkey" FOREIGN KEY (key) REFERENCES keys(id) > > >how would I write an AND query that filtered on 2 separate keys from the >samples_lg_txt table? > >something like: > >SELECT COUNT(*) FROM samples WHERE id IN ( SELECT DISTINCT(s.id) FROM >samples_lg_txt s JOIN keys k ON s.key = k.id WHERE (*name = 'key1' AND tsv >@@ to_tsquery('value1')) AND (name = 'key2' AND tsv @@ >to_tsquery('value2'))*; You're overthinking it SELECT count(distinct s.id) FROM samples_lg_txt AS s JOIN keys AS k ON k.id = s.key WHERE (k.name = 'key1' AND s.tsv @@ to_query('value1') OR (k.name = 'key2' AND s.tsv @@ to_query('value2') There's actually no need to join if you can use the key name instead of an integer id. You can FK on strings, so you can still maintain an identity table of keys. E.g., > id | integer | > key | vchar(32) | FK key(name) ... > val | text | > tsv | tsvector | Then the query could be just SELECT count(distinct id) FROM samples_lg_txt WHERE (key = 'key1' AND tsv @@ to_query('value1') OR (key = 'key2' AND tsv @@ to_query('value2') Just a reminder [it's late here 8-)]: FK columns contain values - not weird references to the foreign tables. The constraint just enforces that any value inserted/updated into the FK column matches an existing value in the relevant foreign table. George -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general