Re: Slow query with 3 table joins

Gerardo Herzig <gherzig@xxxxxxxxxxx> · Wed, 26 Apr 2017 11:00:52 -0400 (EDT)

Some other approaches you could try:

1) What about an hashed index? You could make 
CREATE INDEX ON FIELD (unit_id, hashtext(field_name))

and changing your query accordingly....

"....where hashtext(FIELD.FIELD_NAME)=hashtext('SHEETS_PRESENT') ...."

2) Partitioning (not native yet, but can be simulated through inheritance), like in
https://www.postgresql.org/docs/current/static/ddl-partitioning.html
This could work well if you have a sort of limited different values in FIELD.FIELD_NAME

Gerardo

----- Mensaje original -----
> De: "Alessandro Ferrucci" <alessandroferrucci@xxxxxxxxx>
> Para: pgsql-performance@xxxxxxxxxxxxxx
> Enviados: Miércoles, 26 de Abril 2017 0:19:37
> Asunto: Re:  Slow query with 3 table joins
> 
> 
> 
> After about 40 inutes the slow query finally finished and the result
> of the EXPLAIN plan can be found here:
> 
> 
> https://explain.depesz.com/s/BX22
> 
> 
> Thanks,
> Alessandro Ferrucci
> 
> 
> On Tue, Apr 25, 2017 at 11:10 PM, Alessandro Ferrucci <
> alessandroferrucci@xxxxxxxxx > wrote:
> 
> 
> 
> 
> Hello - I am migrating a current system to PostgreSQL and I am having
> an issue with a relatively straightforward query being extremely
> slow.
> 
> 
> The following are the definitions of the tables:
> 
> 
> CREATE TABLE popt_2017.unit
> (
> id serial NOT NULL,
> unit_id text,
> batch_id text,
> create_date timestamp without time zone DEFAULT now(),
> update_date timestamp without time zone,
> CONSTRAINT unit_pkey PRIMARY KEY (id)
> )
> WITH (
> OIDS=FALSE
> );
> 
> 
> CREATE TABLE popt_2017.field
> (
> id serial NOT NULL,
> unit_id integer,
> subunit_data_id integer,
> field_name character varying(50),
> page_id character varying(20),
> page_type character varying(20),
> batch_id character varying(20),
> file_name character varying(20),
> data_concept integer,
> "GROUP" integer,
> omr_group integer,
> pres integer,
> reg_data text,
> ocr_conf text,
> ocr_dict text,
> ocr_phon text,
> create_date timestamp without time zone DEFAULT now(),
> update_date timestamp without time zone,
> CONSTRAINT field_pkey PRIMARY KEY (id),
> CONSTRAINT field_subunit_data_id_fkey FOREIGN KEY (subunit_data_id)
> REFERENCES popt_2017.subunit (id) MATCH SIMPLE
> ON UPDATE NO ACTION ON DELETE NO ACTION,
> CONSTRAINT field_unit_id_fk FOREIGN KEY (unit_id)
> REFERENCES popt_2017.unit (id) MATCH FULL
> ON UPDATE NO ACTION ON DELETE NO ACTION,
> CONSTRAINT field_unit_id_fkey FOREIGN KEY (unit_id)
> REFERENCES popt_2017.unit (id) MATCH SIMPLE
> ON UPDATE NO ACTION ON DELETE NO ACTION
> )
> WITH (
> OIDS=FALSE
> );
> 
> 
> CREATE TABLE popt_2017.answer
> (
> id serial NOT NULL,
> field_id integer,
> ans_status integer,
> ans text,
> luggage text,
> arec text,
> kfi_partition integer,
> final boolean,
> length integer,
> create_date timestamp without time zone DEFAULT now(),
> update_date timestamp without time zone,
> CONSTRAINT answer_pkey PRIMARY KEY (id),
> CONSTRAINT answer_field_id_fk FOREIGN KEY (field_id)
> REFERENCES popt_2017.field (id) MATCH FULL
> ON UPDATE NO ACTION ON DELETE NO ACTION,
> CONSTRAINT answer_field_id_fkey FOREIGN KEY (field_id)
> REFERENCES popt_2017.field (id) MATCH SIMPLE
> ON UPDATE NO ACTION ON DELETE NO ACTION
> )
> WITH (
> OIDS=FALSE
> );
> 
> 
> Below are the index definitions for those tables:
> 
> 
> UNIT:
> CREATE UNIQUE INDEX unit_pkey ON unit USING btree (id);
> CREATE INDEX unit_unit_id_idx ON unit USING btree (unit_id);
> 
> 
> FIELD:
> CREATE UNIQUE INDEX field_pkey ON field USING btree (id)
> CREATE INDEX field_unit_id_idx ON field USING btree (unit_id)
> CREATE INDEX field_subunit_id_idx ON field USING btree
> (subunit_data_id)
> CREATE INDEX field_field_name_idx ON field USING btree (field_name)
> 
> 
> ANSWER:
> CREATE UNIQUE INDEX answer_pkey ON answer USING btree (id)
> CREATE INDEX answer_field_id_idx ON answer USING btree (field_id)
> CREATE INDEX answer_ans_idx ON answer USING btree (ans)
> 
> 
> The tables each have the following number of rows:
> 
> 
> UNIT: 10,315
> FIELD: 139,397,965
> ANSWER: 3,463,300
> 
> 
> The query in question is:
> 
> 
> SELECT
> UNIT.ID AS UNIT_ID,
> UNIT.UNIT_ID AS UNIT_UNIT_ID,
> UNIT.BATCH_ID AS UNIT_BATCH_ID,
> UNIT.CREATE_DATE AS UNIT_CREATE_DATE,
> UNIT.UPDATE_DATE AS UNIT_UPDATE_DATE
> FROM
> UNIT, FIELD, ANSWER
> WHERE
> UNIT.ID =FIELD.UNIT_ID AND
> FIELD.ID =ANSWER.FIELD_ID AND
> FIELD.FIELD_NAME='SHEETS_PRESENT' AND
> ANSWER.ANS='2';
> 
> 
> I attempted to run an EXPLAIN (ANALYZE,BUFFERS) and the query has
> been running for 32 minutes now, So I won't be able to post the
> results (as I've never been able to get the query to actually
> finish.
> 
> 
> But, if I remove the join to UNIT (and just join FIELD and ANSWER)
> the resulting query is sufficiently fast, (the first time it ran in
> roughly 3 seconds), the query as such is:
> 
> 
> SELECT * FROM
> ANSWER, FIELD
> WHERE
> FIELD.ID =ANSWER.FIELD_ID AND
> FIELD.FIELD_NAME='SHEETS_PRESENT' AND
> ANSWER.ANS='2';
> 
> 
> The EXPLAIN ( ANALYZE, BUFFERS ) output of that query can be found
> here https://explain.depesz.com/s/ueJq
> 
> 
> These tables are static for now, so they do not get DELETEs or
> INSERTS at all and I have run VACUUM ANALYZE on all the affected
> tables.
> 
> 
> I'm running PostgreSQL PostgreSQL 9.2.4 on x86_64-unknown-linux-gnu,
> compiled by gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-52), 64-bit
> 
> 
> I'm running this on RHEL 6.9
> 
> 
> On a server with 32 GB of ram, 2 CPUs.
> 
> 
> The following are the changes to postgresql.conf that I have made:
> 
> 
> shared_buffers = 7871MB
> effective_cache_size = 23611MB
> work_mem = 1000MB
> maintenance_work_mem = 2048MB
> 
> 
> I have not changed the autovacuum settings, but since the tables are
> static for now and I've already ran VACUUM that should not have any
> effect.
> 
> 
> Any assistance that could be provided is greatly appreciated.
> 
> 
> Thank you,
> Alessandro Ferrucci
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> --
> 
> Signed,
> Alessandro Ferrucci

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance