Some other approaches you could try: 1) What about an hashed index? You could make CREATE INDEX ON FIELD (unit_id, hashtext(field_name)) and changing your query accordingly.... "....where hashtext(FIELD.FIELD_NAME)=hashtext('SHEETS_PRESENT') ...." 2) Partitioning (not native yet, but can be simulated through inheritance), like in https://www.postgresql.org/docs/current/static/ddl-partitioning.html This could work well if you have a sort of limited different values in FIELD.FIELD_NAME Gerardo ----- Mensaje original ----- > De: "Alessandro Ferrucci" <alessandroferrucci@xxxxxxxxx> > Para: pgsql-performance@xxxxxxxxxxxxxx > Enviados: Miércoles, 26 de Abril 2017 0:19:37 > Asunto: Re: Slow query with 3 table joins > > > > After about 40 inutes the slow query finally finished and the result > of the EXPLAIN plan can be found here: > > > https://explain.depesz.com/s/BX22 > > > Thanks, > Alessandro Ferrucci > > > On Tue, Apr 25, 2017 at 11:10 PM, Alessandro Ferrucci < > alessandroferrucci@xxxxxxxxx > wrote: > > > > > Hello - I am migrating a current system to PostgreSQL and I am having > an issue with a relatively straightforward query being extremely > slow. > > > The following are the definitions of the tables: > > > CREATE TABLE popt_2017.unit > ( > id serial NOT NULL, > unit_id text, > batch_id text, > create_date timestamp without time zone DEFAULT now(), > update_date timestamp without time zone, > CONSTRAINT unit_pkey PRIMARY KEY (id) > ) > WITH ( > OIDS=FALSE > ); > > > CREATE TABLE popt_2017.field > ( > id serial NOT NULL, > unit_id integer, > subunit_data_id integer, > field_name character varying(50), > page_id character varying(20), > page_type character varying(20), > batch_id character varying(20), > file_name character varying(20), > data_concept integer, > "GROUP" integer, > omr_group integer, > pres integer, > reg_data text, > ocr_conf text, > ocr_dict text, > ocr_phon text, > create_date timestamp without time zone DEFAULT now(), > update_date timestamp without time zone, > CONSTRAINT field_pkey PRIMARY KEY (id), > CONSTRAINT field_subunit_data_id_fkey FOREIGN KEY (subunit_data_id) > REFERENCES popt_2017.subunit (id) MATCH SIMPLE > ON UPDATE NO ACTION ON DELETE NO ACTION, > CONSTRAINT field_unit_id_fk FOREIGN KEY (unit_id) > REFERENCES popt_2017.unit (id) MATCH FULL > ON UPDATE NO ACTION ON DELETE NO ACTION, > CONSTRAINT field_unit_id_fkey FOREIGN KEY (unit_id) > REFERENCES popt_2017.unit (id) MATCH SIMPLE > ON UPDATE NO ACTION ON DELETE NO ACTION > ) > WITH ( > OIDS=FALSE > ); > > > CREATE TABLE popt_2017.answer > ( > id serial NOT NULL, > field_id integer, > ans_status integer, > ans text, > luggage text, > arec text, > kfi_partition integer, > final boolean, > length integer, > create_date timestamp without time zone DEFAULT now(), > update_date timestamp without time zone, > CONSTRAINT answer_pkey PRIMARY KEY (id), > CONSTRAINT answer_field_id_fk FOREIGN KEY (field_id) > REFERENCES popt_2017.field (id) MATCH FULL > ON UPDATE NO ACTION ON DELETE NO ACTION, > CONSTRAINT answer_field_id_fkey FOREIGN KEY (field_id) > REFERENCES popt_2017.field (id) MATCH SIMPLE > ON UPDATE NO ACTION ON DELETE NO ACTION > ) > WITH ( > OIDS=FALSE > ); > > > Below are the index definitions for those tables: > > > UNIT: > CREATE UNIQUE INDEX unit_pkey ON unit USING btree (id); > CREATE INDEX unit_unit_id_idx ON unit USING btree (unit_id); > > > FIELD: > CREATE UNIQUE INDEX field_pkey ON field USING btree (id) > CREATE INDEX field_unit_id_idx ON field USING btree (unit_id) > CREATE INDEX field_subunit_id_idx ON field USING btree > (subunit_data_id) > CREATE INDEX field_field_name_idx ON field USING btree (field_name) > > > ANSWER: > CREATE UNIQUE INDEX answer_pkey ON answer USING btree (id) > CREATE INDEX answer_field_id_idx ON answer USING btree (field_id) > CREATE INDEX answer_ans_idx ON answer USING btree (ans) > > > The tables each have the following number of rows: > > > UNIT: 10,315 > FIELD: 139,397,965 > ANSWER: 3,463,300 > > > The query in question is: > > > SELECT > UNIT.ID AS UNIT_ID, > UNIT.UNIT_ID AS UNIT_UNIT_ID, > UNIT.BATCH_ID AS UNIT_BATCH_ID, > UNIT.CREATE_DATE AS UNIT_CREATE_DATE, > UNIT.UPDATE_DATE AS UNIT_UPDATE_DATE > FROM > UNIT, FIELD, ANSWER > WHERE > UNIT.ID =FIELD.UNIT_ID AND > FIELD.ID =ANSWER.FIELD_ID AND > FIELD.FIELD_NAME='SHEETS_PRESENT' AND > ANSWER.ANS='2'; > > > I attempted to run an EXPLAIN (ANALYZE,BUFFERS) and the query has > been running for 32 minutes now, So I won't be able to post the > results (as I've never been able to get the query to actually > finish. > > > But, if I remove the join to UNIT (and just join FIELD and ANSWER) > the resulting query is sufficiently fast, (the first time it ran in > roughly 3 seconds), the query as such is: > > > SELECT * FROM > ANSWER, FIELD > WHERE > FIELD.ID =ANSWER.FIELD_ID AND > FIELD.FIELD_NAME='SHEETS_PRESENT' AND > ANSWER.ANS='2'; > > > The EXPLAIN ( ANALYZE, BUFFERS ) output of that query can be found > here https://explain.depesz.com/s/ueJq > > > These tables are static for now, so they do not get DELETEs or > INSERTS at all and I have run VACUUM ANALYZE on all the affected > tables. > > > I'm running PostgreSQL PostgreSQL 9.2.4 on x86_64-unknown-linux-gnu, > compiled by gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-52), 64-bit > > > I'm running this on RHEL 6.9 > > > On a server with 32 GB of ram, 2 CPUs. > > > The following are the changes to postgresql.conf that I have made: > > > shared_buffers = 7871MB > effective_cache_size = 23611MB > work_mem = 1000MB > maintenance_work_mem = 2048MB > > > I have not changed the autovacuum settings, but since the tables are > static for now and I've already ran VACUUM that should not have any > effect. > > > Any assistance that could be provided is greatly appreciated. > > > Thank you, > Alessandro Ferrucci > > > > > > > > > > > > -- > > Signed, > Alessandro Ferrucci -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance