----- "nha" <lyondif02@xxxxxxx> wrote: > From: "nha" <lyondif02@xxxxxxx> > To: "John Cheng" <jlcheng@xxxxxxxxx> > Cc: pgsql-general@xxxxxxxxxxxxxx > Sent: Monday, July 6, 2009 9:12:22 AM GMT -08:00 US/Canada Pacific > Subject: Re: Problem search on text arrays, using the overlaps (&&) operator > > Hello, > > With your exhaustive example statements based on table foo and cars, I > > performed some measures on my side (PostgreSQL 8.3.1 server). Here are > > some statistical results: > [ ... snipped ... ] > > In my opinion, analysis and optimization may be deepen over table > indexes used for join planning. As your reported query plans show, the > > Where clauses are performed independantly from the table ml_lead; the > > reason is that all the attributes of the clauses belong to the table > lead_reporting_data. Time may be reduced on join condition > achievements. > > Hoping this observation will contribute a little to your opinion. > > Without any claim, I attached a document to this email for details on > > the measures I took with the overlap operator -- OpenDocument > Spreadsheet (ODS) v2 formatted file, 24 kiB. The 3rd sheet "various" > presents the detailed measures related to the data reported in this > email. > > Regards. > > -- > nha / Lyon / France. Hi nha, I had not expected anyone to go to such lengths to evaluate my situation, thank you so much! After looking at your analysis, I realized that the test case I created isn't close enough to the queries running in our prod environment. For one, table 'foo' does not join to another table; The other thing is that the amount of data isn't the same; Finally, these tables have been ANALYZED. So I took some time to update the test case. On our server, running 8.3.6, I was able to reproduce the difference between the two styles: "arr&&{f,b}" and "arr&&{f} or arr&&{b}". First, the setup: -- Begin test case -- Sets up 'bar' SELECT id INTO TEMP TABLE bar FROM (SELECT generate_series(1,300000) as id) AS bar; CREATE INDEX bar_idx ON bar (id); ANALYZE bar; -- Sets up 'foo' CREATE TEMP SEQUENCE foo_bar_id_seq; CREATE TEMP TABLE foo ( bar_id numeric DEFAULT NEXTVAL('foo_bar_id_seq'), keywords text[] ); CREATE INDEX foo_idx ON foo USING gin (keywords); INSERT INTO foo (keywords) VALUES ('{ford}'::text[]); INSERT INTO foo (keywords) VALUES ('{toyota}'::text[]); INSERT INTO foo (keywords) VALUES ('{volkswagen}'::text[]); INSERT INTO foo (keywords) VALUES ('{saturn}'::text[]); INSERT INTO foo (keywords) VALUES ('{honda}'::text[]); INSERT INTO foo (keywords) VALUES ('{porsche}'::text[]); INSERT INTO foo (keywords) VALUES ('{porsche, audi, chrysler}'::text[]); INSERT INTO foo (keywords) VALUES ('{honda, hummer, ferrari}'::text[]); INSERT INTO foo (keywords) (SELECT keywords FROM foo); INSERT INTO foo (keywords) (SELECT keywords FROM foo); INSERT INTO foo (keywords) (SELECT keywords FROM foo); INSERT INTO foo (keywords) (SELECT keywords FROM foo); INSERT INTO foo (keywords) (SELECT keywords FROM foo); INSERT INTO foo (keywords) (SELECT keywords FROM foo); INSERT INTO foo (keywords) (SELECT keywords FROM foo); INSERT INTO foo (keywords) (SELECT keywords FROM foo); INSERT INTO foo (keywords) (SELECT keywords FROM foo); INSERT INTO foo (keywords) (SELECT keywords FROM foo); INSERT INTO foo (keywords) (SELECT keywords FROM foo); INSERT INTO foo (keywords) (SELECT keywords FROM foo); INSERT INTO foo (keywords) (SELECT keywords FROM foo); INSERT INTO foo (keywords) (SELECT keywords FROM foo); INSERT INTO foo (keywords) (SELECT keywords FROM foo); INSERT INTO foo (keywords) (SELECT keywords FROM foo); INSERT INTO foo (keywords) (SELECT keywords FROM foo); INSERT INTO foo (keywords) (SELECT keywords FROM foo); ANALYZE foo; -- End test case Query for the form "arr&&{f,b}" SELECT count(*) FROM foo INNER JOIN bar ON foo.bar_id = bar.id WHERE foo.keywords && '{ford, toyota, volkswagen, saturn, honda, porsche, hummer, ferrari}'::text[]; Query for the form "arr&&{f} or arr&&{b}": SELECT count(*) FROM foo, bar WHERE foo.bar_id = bar.id AND ( keywords && '{ford}'::text[] OR keywords && '{toyota}'::text[] OR keywords && '{volkswagen}'::text[] OR keywords && '{saturn}'::text[] OR keywords && '{honda}'::text[] OR keywords && '{porsche}'::text[] OR keywords && '{hummer}'::text[] OR keywords && '{ferrari}'::text[] ); For the first form, "arr&&{f,b}", the query takes about 15 seconds. For the second form "arr&&{f} or arr&&{b}", we get about 8 seconds. The difference is around 90-100%, which is what I am seeing on our real life queries. The query plans also become similar to the real life query plans. But I am having a hard time learning from them. The only interesting I see is that the estimated cost seems to be different than the actual run time. The second form has a higher estimated cost than the first form, but has a lower run time. In this test case, the query filters by any of 8 keywords. Note that with just 2 keywords, the difference is only about 5 seconds. The specific report that our users complained about involved 16 "keywords", where the difference is about 100%. When you said "analysis and optimization may be deepen over table indexes used for join planning", I'm not sure what you mean. Can you clarify? -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general