Search Postgresql Archives

Sort and Limit - really nasty query and feature of the day

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




Today I rewrote a particularly nasty query involving a UNION ALL between an active table and a huge archive table, some left joins, order by and limit, and it went from 5 minutes to under one second ; however one query became 4 with some glue in between.

EXPLAIN
SELECT * FROM (
SELECT 0 AS archived, id, price, surface, coords, detect_time, type_id, vente, zipcode, city_id, description FROM annonces
UNION ALL
SELECT 1 AS archived, a.id, price, surface, coords, detect_time, type_id, vente, zipcode, city_id, description FROM archive_data a LEFT JOIN archive_ext d ON a.id=d.id ) AS foo
WHERE
	detect_time >= '2006-10-30 16:17:45.064793'
AND type_id IN (1,12,24,17,18,19,20,33,35,50,51,7,52,4,13,41,14,16,26,28,43,53,15,29,30,31,45,32,34,46,47,6)
	AND vente
AND (zipcode IN (69001,69002,69003,69004,69005,69006,69007,69008,69009) OR city_id IN (27595) OR coords && '(45.74101689082,4.8371263505564),(45.75898310918,4.8628736494436)'::BOX)
	AND surface IS NOT NULL AND price IS NOT NULL
ORDER BY price/surface
LIMIT 100;

	Here is the messy explain :

 Limit  (cost=333560.35..333560.60 rows=100 width=103)
   ->  Sort  (cost=333560.35..333656.88 rows=38610 width=103)
         Sort Key: (foo.price / (foo.surface)::double precision)
         ->  Result  (cost=133.21..328438.41 rows=38610 width=103)
               ->  Append  (cost=133.21..328245.36 rows=38610 width=103)
-> Bitmap Heap Scan on annonces (cost=133.21..7520.56 rows=1426 width=190) Recheck Cond: ((vente AND (zipcode = ANY ('{69001,69002,69003,69004,69005,69006,69007,69008,69009}'::integer[]))) OR (vente AND (city_id = 27595)) OR (coords && '(45.75898310918,4.8628736494436),(45.74101689082,4.8371263505564)'::box)) Filter: ((detect_time >= '2006-10-30 16:17:45.064793'::timestamp without time zone) AND (type_id = ANY ('{1,12,24,17,18,19,20,33,35,50,51,7,52,4,13,41,14,16,26,28,43,53,15,29,30,31,45,32,34,46,47,6}'::integer[])) AND vente AND (surface IS NOT NULL) AND (price IS NOT NULL)) -> BitmapOr (cost=133.21..133.21 rows=4294 width=0) -> Bitmap Index Scan on annonces_zip (cost=0.00..55.91 rows=1761 width=0) Index Cond: ((vente = true) AND (zipcode = ANY ('{69001,69002,69003,69004,69005,69006,69007,69008,69009}'::integer[]))) -> Bitmap Index Scan on annonces_city (cost=0.00..42.85 rows=1859 width=0) Index Cond: ((vente = true) AND (city_id = 27595)) -> Bitmap Index Scan on annonces_coords (cost=0.00..33.37 rows=675 width=0) Index Cond: (coords && '(45.75898310918,4.8628736494436),(45.74101689082,4.8371263505564)'::box) -> Merge Right Join (cost=59679.03..320338.70 rows=37184 width=182)
                           Merge Cond: (d.id = a.id)
-> Index Scan using archive_ext_pkey on archive_ext d (cost=0.00..252661.12 rows=2976314 width=119) -> Sort (cost=59679.03..59771.99 rows=37184 width=67)
                                 Sort Key: a.id
-> Bitmap Heap Scan on archive_data a (cost=3951.02..56856.32 rows=37184 width=67) Recheck Cond: ((vente AND (zipcode = ANY ('{69001,69002,69003,69004,69005,69006,69007,69008,69009}'::integer[]))) OR (vente AND (city_id = 27595)) OR (coords && '(45.75898310918,4.8628736494436),(45.74101689082,4.8371263505564)'::box)) Filter: ((detect_time >= '2006-10-30 16:17:45.064793'::timestamp without time zone) AND (type_id = ANY ('{1,12,24,17,18,19,20,33,35,50,51,7,52,4,13,41,14,16,26,28,43,53,15,29,30,31,45,32,34,46,47,6}'::integer[])) AND vente AND (surface IS NOT NULL) AND (price IS NOT NULL)) -> BitmapOr (cost=3951.02..3951.02 rows=171699 width=0) -> Bitmap Index Scan on archive_data_zip (cost=0.00..1692.62 rows=80610 width=0) Index Cond: ((vente = true) AND (zipcode = ANY ('{69001,69002,69003,69004,69005,69006,69007,69008,69009}'::integer[]))) -> Bitmap Index Scan on archive_data_city (cost=0.00..1695.31 rows=80683 width=0) Index Cond: ((vente = true) AND (city_id = 27595)) -> Bitmap Index Scan on archive_data_coords (cost=0.00..535.20 rows=10406 width=0) Index Cond: (coords && '(45.75898310918,4.8628736494436),(45.74101689082,4.8371263505564)'::box)

I didn't redo the explain analyze, it takes too long ; however the stats and count estimates are pretty good, and it takes a good 5 minutes.

However, the interesting parts of the query are very fast. Let's disassemble it :

EXPLAIN ANALYZE SELECT 0 AS archived, id, price, surface, coords, detect_time, type_id, vente, zipcode, city_id, description FROM annonces
WHERE
detect_time >= '2006-10-30 16:17:45.064793'
AND type_id IN (1,12,24,17,18,19,20,33,35,50,51,7,52,4,13,41,14,16,26,28,43,53,15,29,30,31,45,32,34,46,47,6)
AND vente
AND (zipcode IN (69001,69002,69003,69004,69005,69006,69007,69008,69009) OR city_id IN (27595) OR coords && '(45.74101689082,4.8371263505564),(45.75898310918,4.8628736494436)'::BOX)
AND surface IS NOT NULL AND price IS NOT NULL
ORDER BY price/surface
LIMIT 100;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=7602.40..7602.65 rows=100 width=190) (actual time=19.102..19.163 rows=100 loops=1) -> Sort (cost=7602.40..7605.96 rows=1426 width=190) (actual time=19.100..19.123 rows=100 loops=1)
         Sort Key: (price / (surface)::double precision)
-> Bitmap Heap Scan on annonces (cost=133.21..7527.69 rows=1426 width=190) (actual time=4.255..16.725 rows=910 loops=1) Recheck Cond: ((vente AND (zipcode = ANY ('{69001,69002,69003,69004,69005,69006,69007,69008,69009}'::integer[]))) OR (vente AND (city_id = 27595)) OR (coords && '(45.75898310918,4.8628736494436),(45.74101689082,4.8371263505564)'::box)) Filter: ((detect_time >= '2006-10-30 16:17:45.064793'::timestamp without time zone) AND (type_id = ANY ('{1,12,24,17,18,19,20,33,35,50,51,7,52,4,13,41,14,16,26,28,43,53,15,29,30,31,45,32,34,46,47,6}'::integer[])) AND vente AND (surface IS NOT NULL) AND (price IS NOT NULL)) -> BitmapOr (cost=133.21..133.21 rows=4294 width=0) (actual time=2.662..2.662 rows=0 loops=1) -> Bitmap Index Scan on annonces_zip (cost=0.00..55.91 rows=1761 width=0) (actual time=0.518..0.518 rows=1367 loops=1) Index Cond: ((vente = true) AND (zipcode = ANY ('{69001,69002,69003,69004,69005,69006,69007,69008,69009}'::integer[]))) -> Bitmap Index Scan on annonces_city (cost=0.00..42.85 rows=1859 width=0) (actual time=0.364..0.364 rows=1316 loops=1) Index Cond: ((vente = true) AND (city_id = 27595)) -> Bitmap Index Scan on annonces_coords (cost=0.00..33.37 rows=675 width=0) (actual time=1.776..1.776 rows=2449 loops=1) Index Cond: (coords && '(45.75898310918,4.8628736494436),(45.74101689082,4.8371263505564)'::box)

Total runtime: 19.327 ms

	Bitmap saves the day. Now for the other part :	

EXPLAIN ANALYZE SELECT 1 AS archived, id, price, surface, coords, detect_time, type_id, vente, zipcode, city_id FROM archive_data
WHERE
detect_time >= '2006-10-30 16:17:45.064793'
AND type_id IN (1,12,24,17,18,19,20,33,35,50,51,7,52,4,13,41,14,16,26,28,43,53,15,29,30,31,45,32,34,46,47,6)
AND vente
AND (zipcode IN (69001,69002,69003,69004,69005,69006,69007,69008,69009) OR city_id IN (27595) OR coords && '(45.74101689082,4.8371263505564),(45.75898310918,4.8628736494436)'::BOX)
AND surface IS NOT NULL AND price IS NOT NULL
ORDER BY price/surface
LIMIT 100;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=59864.95..59865.20 rows=100 width=67) (actual time=490.718..490.793 rows=100 loops=1) -> Sort (cost=59864.95..59957.91 rows=37184 width=67) (actual time=490.716..490.765 rows=100 loops=1)
         Sort Key: (price / (surface)::double precision)
-> Bitmap Heap Scan on archive_data (cost=3951.02..57042.24 rows=37184 width=67) (actual time=223.720..344.485 rows=27761 loops=1) Recheck Cond: ((vente AND (zipcode = ANY ('{69001,69002,69003,69004,69005,69006,69007,69008,69009}'::integer[]))) OR (vente AND (city_id = 27595)) OR (coords && '(45.75898310918,4.8628736494436),(45.74101689082,4.8371263505564)'::box)) Filter: ((detect_time >= '2006-10-30 16:17:45.064793'::timestamp without time zone) AND (type_id = ANY ('{1,12,24,17,18,19,20,33,35,50,51,7,52,4,13,41,14,16,26,28,43,53,15,29,30,31,45,32,34,46,47,6}'::integer[])) AND vente AND (surface IS NOT NULL) AND (price IS NOT NULL)) -> BitmapOr (cost=3951.02..3951.02 rows=171699 width=0) (actual time=144.175..144.175 rows=0 loops=1) -> Bitmap Index Scan on archive_data_zip (cost=0.00..1692.62 rows=80610 width=0) (actual time=38.715..38.715 rows=86909 loops=1) Index Cond: ((vente = true) AND (zipcode = ANY ('{69001,69002,69003,69004,69005,69006,69007,69008,69009}'::integer[]))) -> Bitmap Index Scan on archive_data_city (cost=0.00..1695.31 rows=80683 width=0) (actual time=26.576..26.576 rows=85868 loops=1) Index Cond: ((vente = true) AND (city_id = 27595)) -> Bitmap Index Scan on archive_data_coords (cost=0.00..535.20 rows=10406 width=0) (actual time=78.880..78.880 rows=117333 loops=1) Index Cond: (coords && '(45.75898310918,4.8628736494436),(45.74101689082,4.8371263505564)'::box)
Total runtime: 492.530 ms

So, taken individually, postgres is exceedingly good on the "hard" parts of this query (ie. finding what I want).
	Problem is that, on such a query, it would really pay to :

	- Move the ORDER BY and LIMIT inside the UNION ALL
	Postgres already moves the WHERE conditions.
Obviously, if both sides of the UNION have the same ORDER BY and LIMIT, moving them inside would work well.
	This only works when LIMIT is present, of course.

- Continue moving the ORDER BY and LIMIT inside the LEFT JOIN so it can be performed before the MERGE JOIN This allow merging 100 rows instead of 27761, which could even be done with some other join type like Nested Loop.
	This also only works with LIMIT.

	- and re-sort the final query result since it's an UNION.

I don't think it would change the result since it is a left join, all the rows on the left part are in the end result anyway.
	Is it planned to add this to postgres sometimes ?

In the end I used the two fast SELECTs to get the ids of the items to display and had PHP shove them back into the original view which has more stuff going on inside, I was too lazy to redo it and my search engine. The end result is about 300 times faster...








[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux