On 20 May 2010 06:06, David Jarvis <thangalin@xxxxxxxxx> wrote: > Hi, > > I recently switched to PostgreSQL from MySQL so that I can use PL/R for data > analysis. The query in MySQL form (against a more complex table structure) > takes ~5 seconds to run. The query in PostgreSQL I have yet to let finish, > as it takes over a minute. I think I have the correct table structure in > place (it is much simpler than the former structure in MySQL), however the > query executes a full table scan against the parent table's 273 million > rows. > > Questions > > What is the proper way to index the dates to avoid full table scans? > > Options I have considered: > > GIN > GiST > Rewrite the WHERE clause > Separate year_taken, month_taken, and day_taken columns to the tables > > Details > > The HashAggregate from the plan shows a cost of 10006220141.11, which is, I > suspect, on the astronomically huge side. There is a full table scan on the > measurement table (itself having neither data nor indexes) being performed. > The table aggregates 237 million rows from its child tables. The > sluggishness comes from this part of the query: > > m.taken BETWEEN > /* Start date. */ > (extract( YEAR FROM m.taken )||'-01-01')::date AND > /* End date. Calculated by checking to see if the end date wraps > into the next year. If it does, then add 1 to the current year. > */ > (cast(extract( YEAR FROM m.taken ) + greatest( -1 * > sign( > (extract( YEAR FROM m.taken )||'-12-31')::date - > (extract( YEAR FROM m.taken )||'-01-01')::date ), 0 > ) AS text)||'-12-31')::date > > There are 72 child tables, each having a year index and a station index, > which are defined as follows: > > CREATE TABLE climate.measurement_12_013 ( > -- Inherited from table climate.measurement_12_013: id bigint NOT NULL > DEFAULT nextval('climate.measurement_id_seq'::regclass), > -- Inherited from table climate.measurement_12_013: station_id integer > NOT NULL, > -- Inherited from table climate.measurement_12_013: taken date NOT > NULL, > -- Inherited from table climate.measurement_12_013: amount numeric(8,2) > NOT NULL, > -- Inherited from table climate.measurement_12_013: category_id > smallint NOT NULL, > -- Inherited from table climate.measurement_12_013: flag character > varying(1) NOT NULL DEFAULT ' '::character varying, > CONSTRAINT measurement_12_013_category_id_check CHECK (category_id = > 7), > CONSTRAINT measurement_12_013_taken_check CHECK > (date_part('month'::text, taken)::integer = 12) > ) > INHERITS (climate.measurement) > > CREATE INDEX measurement_12_013_s_idx > ON climate.measurement_12_013 > USING btree > (station_id); > CREATE INDEX measurement_12_013_y_idx > ON climate.measurement_12_013 > USING btree > (date_part('year'::text, taken)); > > (Foreign key constraints to be added later.) > > The following query runs abysmally slow due to a full table scan: > > SELECT > count(1) AS measurements, > avg(m.amount) AS amount > FROM > climate.measurement m > WHERE > m.station_id IN ( > SELECT > s.id > FROM > climate.station s, > climate.city c > WHERE > /* For one city... */ > c.id = 5182 AND > > /* Where stations are within an elevation range... */ > s.elevation BETWEEN 0 AND 3000 AND > > /* and within a specific radius... */ > 6371.009 * SQRT( > POW(RADIANS(c.latitude_decimal - s.latitude_decimal), 2) + > (COS(RADIANS(c.latitude_decimal + s.latitude_decimal) / 2) * > POW(RADIANS(c.longitude_decimal - s.longitude_decimal), > 2)) > ) <= 50 > ) AND > > /* Data before 1900 is shaky; insufficient after 2009. */ > extract( YEAR FROM m.taken ) BETWEEN 1900 AND 2009 AND > > /* Whittled down by category... */ > m.category_id = 1 AND > > /* Between the selected days and years... */ > m.taken BETWEEN > /* Start date. */ > (extract( YEAR FROM m.taken )||'-01-01')::date AND > /* End date. Calculated by checking to see if the end date wraps > into the next year. If it does, then add 1 to the current year. > */ > (cast(extract( YEAR FROM m.taken ) + greatest( -1 * > sign( > (extract( YEAR FROM m.taken )||'-12-31')::date - > (extract( YEAR FROM m.taken )||'-01-01')::date ), 0 > ) AS text)||'-12-31')::date > GROUP BY > extract( YEAR FROM m.taken ) > > What are your thoughts? > > Thank you! > > Could you provide the EXPLAIN output for that slow query? Thom -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance