Hello, Le 8/07/09 0:52, John Cheng a écrit : > I don't mean to be pesky. I was just wondering if there is anything > else I should try? > > Should I simply rewrite all queries, change the form > > WHERE textarr && '{foo, bar}'::text[] > > To > > WHERE (textarr && '{foo}'::text[] > OR textarr && '{bar}'::text[]) > > ? > While still puzzled by the big runtime difference you report between the 2 condition forms, I went on assessing these runtimes on my side from the new case statements that are assumed to reflect more the real world. Here are some measure results I got: (sorry for this long table) seq style runtime --- ----- ------- (db=slf) N01 OR-EA 6 237 N02 CC-EA 5 250 N03 OR+EA 12 628 N04 CC+EA 12 700 N05 OR+EA 15 679 N06 CC+EA 11 510 N07 CC-EA 7 712 N08 OR-EA 8 741 N09 CC-EA 4 963 N10 OR-EA 6 499 (db=stg) N11 CC+EA 15 978 N12 OR+EA 15 350 N13 CC-EA 8 102 N14 OR-EA 9 428 N15 OR-EA 5 267 N16 CC-EA 5 017 N17 OR-EA 6 119 N18 CC-EA 4 955 N19 OR+EA 11 722 N20 CC+EA 11 532 N21 OR-EA 7 303 N22 CC-EA 5 438 N23 CC-EA 5 519 N24 OR-EA 5 373 N25 OR-EA 5 422 N26 CC-EA 5 064 (db=stg) N27 CC-EA 8 314 (db=slf) N28 OR-EA 6 656 (db=stg) N29 OR-EA 6 760 (db=slf) N30 CC-EA 6 753 (db=stg) N31 CC-EA 5 500 (db=slf) N32 OR-EA 5 907 (db=stg) N33 OR-EA 5 391 (db=slf) N34 CC-EA 5 517 --- ----- ---------- Legend ------ seq: sequence order. style: condition style of query. CC: style "arr&&{f,b}" (one clause with multi-value text table). OR: style "arr&&{f} or arr&&{b}" (many clauses with 1-value text table). OR2: same style as style OR, with explicit JOIN in query expression. +EA: performed with EXPLAIN ANALYZE on query. Slower. -EA: performed without EXPLAIN ANALYZE on query. Faster. runtime: run time in milliseconds. (db=?): indicates that the following sequences have been performed after a drop-and-create process for all the tables and indexes. ------ Results from 2 selected EXPLAIN ANALYZE sequences: -- seq 03 (OR+EA) Aggregate (cost=37630.52..37630.53 rows=1 width=0) (actual time=12628.182..12628.184 rows=1 loops=1) -> Hash Join (cost=25989.12..37601.04 rows=11792 width=0) (actual time=8796.002..12231.422 rows=300000 loops=1) Hash Cond: ((bar.id)::numeric = foo.bar_id) -> Seq Scan on bar (cost=0.00..4328.00 rows=300000 width=4) (actual time=0.025..402.458 rows=300000 loops=1) -> Hash (cost=24636.81..24636.81 rows=82425 width=8) (actual time=8795.027..8795.027 rows=2097152 loops=1) -> Bitmap Heap Scan on foo (cost=1565.44..24636.81 rows=82425 width=8) (actual time=670.248..5098.109 rows=2097152 loops=1) Recheck Cond: ((keywords && '{ford}'::text[]) OR (keywords && '{toyota}'::text[]) OR (keywords && '{volkswagen}'::text[]) OR (keywords && '{saturn}'::text[]) OR (keywords && '{honda}'::text[]) OR (keywords && '{porsche}'::text[]) OR (keywords && '{hummer}'::text[]) OR (keywords && '{ferrari}'::text[])) -> BitmapOr (cost=1565.44..1565.44 rows=83879 width=0) (actual time=665.516..665.516 rows=0 loops=1) -> Bitmap Index Scan on foo_idx (cost=0.00..175.07 rows=10485 width=0) (actual time=114.013..114.013 rows=262144 loops=1) Index Cond: (keywords && '{ford}'::text[]) -> Bitmap Index Scan on foo_idx (cost=0.00..175.07 rows=10485 width=0) (actual time=72.398..72.398 rows=262144 loops=1) Index Cond: (keywords && '{toyota}'::text[]) -> Bitmap Index Scan on foo_idx (cost=0.00..175.07 rows=10485 width=0) (actual time=74.118..74.118 rows=262144 loops=1) Index Cond: (keywords && '{volkswagen}'::text[]) -> Bitmap Index Scan on foo_idx (cost=0.00..175.07 rows=10485 width=0) (actual time=58.486..58.486 rows=262144 loops=1) Index Cond: (keywords && '{saturn}'::text[]) -> Bitmap Index Scan on foo_idx (cost=0.00..175.07 rows=10485 width=0) (actual time=114.671..114.671 rows=524288 loops=1) Index Cond: (keywords && '{honda}'::text[]) -> Bitmap Index Scan on foo_idx (cost=0.00..175.07 rows=10485 width=0) (actual time=115.290..115.290 rows=524288 loops=1) Index Cond: (keywords && '{porsche}'::text[]) -> Bitmap Index Scan on foo_idx (cost=0.00..175.07 rows=10485 width=0) (actual time=58.184..58.184 rows=262144 loops=1) Index Cond: (keywords && '{hummer}'::text[]) -> Bitmap Index Scan on foo_idx (cost=0.00..175.07 rows=10485 width=0) (actual time=58.336..58.336 rows=262144 loops=1) Index Cond: (keywords && '{ferrari}'::text[]) Total runtime: 12628.401 ms -- seq 03 (OR+EA) -- seq 04 (CC+EA) Aggregate (cost=26726.37..26726.38 rows=1 width=0) (actual time=12700.620..12700.621 rows=1 loops=1) -> Hash Join (cost=17879.62..26722.62 rows=1500 width=0) (actual time=8482.572..12272.449 rows=300000 loops=1) Hash Cond: ((bar.id)::numeric = foo.bar_id) -> Seq Scan on bar (cost=0.00..4328.00 rows=300000 width=4) (actual time=0.029..412.984 rows=300000 loops=1) -> Hash (cost=17748.56..17748.56 rows=10485 width=8) (actual time=8481.524..8481.524 rows=2097152 loops=1) -> Bitmap Heap Scan on foo (cost=177.69..17748.56 rows=10485 width=8) (actual time=978.464..4679.954 rows=2097152 loops=1) Recheck Cond: (keywords && '{ford,toyota,volkswagen,saturn,honda,porsche,hummer,ferrari}'::text[]) -> Bitmap Index Scan on foo_idx (cost=0.00..175.07 rows=10485 width=0) (actual time=973.569..973.569 rows=2097152 loops=1) Index Cond: (keywords && '{ford,toyota,volkswagen,saturn,honda,porsche,hummer,ferrari}'::text[]) Total runtime: 12700.838 ms -- seq 04 (CC+EA) As far as I tried, the runtime difference between the 2 forms varies up to 10% and always under 1 second--ie. far away from the 100% and (15-8=) 7 seconds reported. Moreover the form "arr&&{f,b}" (let us call it: form 1) often shows better than the other (let us call it: form 2)--already noticed this fact by previous measures without joins. Exception occurs in a special case: for the first query submitted immediately after populating tables. It seems as if indexes were not optimized at this time. But all the following queries, whatever form is considered, run in lesser and similar time. When observing the runtime of intermediate steps (bitmap heap scan on table foo [step 1], sequential scan on table bar [step 2], hash join [step 3]), it can be noted that all 3 steps run in a comparable time whatever form used: - no relevant difference for steps 2 and 3, but an attended long time for retrieving all the resulting rows because of their high number; - for step 1: form 1 (arr&&{f,b}) needs more time to extract the 1st row of the resultset but is faster to collect the last rows. So no significant difference may be also revealed here. As I mentioned, I noticed that the runtime is longer for the 1st query performed after populating tables and that, since the 2nd query performed, runtimes clearly decrease towards a mean (about 6 seconds on my side). Maybe the runtimes reported (15 and 8 seconds respectively) occur in such a case (just after populating tables) with a performing order starting with form 1. As much as I saw, the sequence order acts upon the runtime reduction. Moreover successive queries tend to mean and equal runtimes with the 2 forms. About the user complaint, I would be not so astonished because of the duration of one query (more than 5 seconds in the 2 cases). There are so many rows to select that it seems not really possible to get a lesser runtime. What I may conclude: query plans show difference in performing the queries: each Where clause is assessed before a commun heap scan of table foo. The Where clause of the form 1 (multi-value text table) is longer to run because of multi-value to test for each key; the Where clause of the form 2 (1-value text table) is faster but the addition of the corresponding 1-value text table Where clauses runs in a comparable time. This observation resulted from measures on my sides and may not correctly reflect some proper configurations of the real production environment. Opting to form 2 (arr&&{f} or arr&&{b}) will surely not make runtime worst. So it would not be a bad choice to implement it if convinced. Another way could concern the hash join. It has been shown that this step costs a lot with respect to the overall runtime. Depending on available storage space and DBMS load, a kind of materialized view may be handled in order to cut off the overloading join. Here are some suggested statements to create this helper table: -- Creates a table jfoobar wrt. tables foo and bar -- (max 10 seconds elapsed on my platform) CREATE OR REPLACE TABLE jfoobar AS SELECT foo.bar_id, foo.keywords FROM foo INNER JOIN bar ON foo.bar_id = bar.id; -- Creates an index for jfoobar -- (max 3 seconds elapsed on my platform) -- this way... CREATE INDEX jfoobar_idx ON jfoobar(bar_id, keywords); -- or this one... DROP INDEX jfoobar_idx; CREATE INDEX jfoobar_idx ON jfoobar USING gin(keywords); -- This table may be updated or replaced on a regular time base or by triggered event on updating foo or bar tables. Finally, any query of the form 1 or 2 will run in less than 300 ms: -- form 1: unique text table SELECT count(*) FROM jfoobar WHERE keywords && '{ford, toyota, volkswagen, saturn, honda, porsche, hummer, ferrari}'::text[]; -- form 2: developped text table SELECT count(*) FROM jfoobar WHERE ( keywords && '{ford}'::text[] OR keywords && '{toyota}'::text[] OR keywords && '{volkswagen}'::text[] OR keywords && '{saturn}'::text[] OR keywords && '{honda}'::text[] OR keywords && '{porsche}'::text[] OR keywords && '{hummer}'::text[] OR keywords && '{ferrari}'::text[] ); -- The runtime would only depend on the update strategy for the join table. Regards. -- nha / Lyon / France. -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general