window function induces full table scan

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



When querying a view with a WHERE condition, postgresql normally is able to perform an index scan which reduces time for evaluation dramatically.

However, if a window function is evaluated in the view, postgresql is evaluating the window function before the WHERE condition is applied. This induces a full table scan.

These are the results of EXPLAIN:
-- without window function (non-equivalent)
explain select * from without_window_function where user_id = 43;
          QUERY PLAN
-----------------------------------------------------------------------------------------------
Index Scan using idx_checkin_node_user_id on checkin_node (cost=0.43..26.06 rows=2 width=20)
   Index Cond: (user_id = 43)
   Filter: (((id % 1000) + 1) = 1)

-- with window function
 explain select * from last_position where user_id = 43;
          QUERY PLAN
------------------------------------------------------------------------------------------
Subquery Scan on tmp_last_position (cost=973803.66..1151820.09 rows=2 width=20) Filter: ((tmp_last_position.datepos = 1) AND (tmp_last_position.user_id = 43))
   ->  WindowAgg  (cost=973803.66..1080613.52 rows=4747105 width=32)
         ->  Sort  (cost=973803.66..985671.42 rows=4747105 width=32)
Sort Key: checkin_node.user_id, checkin_node.date, checkin_node.id -> Seq Scan on checkin_node (cost=0.00..106647.05 rows=4747105 width=32)

To work around this, I avoid using a view for that (equivalent):
EXPLAIN SELECT user_id, latitude, longitude
 FROM (
        SELECT
          user_id,
          latitude,
          longitude,
rank() OVER (PARTITION BY user_id ORDER BY date DESC, id DESC) AS datepos
        FROM checkin_node
        WHERE user_id = 43
      ) AS tmp_last_position
 WHERE datepos = 1; -- takes 2 ms
        QUERY PLAN
-------------------------------------------------------------------------------------------------------------------
 Subquery Scan on tmp_last_position  (cost=39.70..52.22 rows=2 width=20)
   Filter: (tmp_last_position.datepos = 1)
   ->  WindowAgg  (cost=39.70..47.40 rows=385 width=32)
         ->  Sort  (cost=39.70..40.67 rows=385 width=32)
               Sort Key: checkin_node.date, checkin_node.id
-> Index Scan using idx_checkin_node_user_id on checkin_node (cost=0.43..23.17 rows=385 width=32)
                     Index Cond: (user_id = 43)


I would expect postgresql to apply this query plan also for the view last_position. It's 6621ms vs. 2ms, so the speedup is 3310!

Is it a bug in the optimizer?

How to reproduce:
=================
OS: ubuntu 12.04
Postgresql v9.3.2

get some sample data:
wget -qO- http://snap.stanford.edu/data/loc-brightkite_totalCheckins.txt.gz|gunzip -c|dos2unix|awk '{ if (length($0) > 20) print }'>test.csv

execute psql script:


\timing on
BEGIN;
DROP TABLE IF EXISTS checkin_node CASCADE;
CREATE TABLE checkin_node (
  id SERIAL NOT NULL PRIMARY KEY,
  user_id INTEGER NOT NULL,
  date TIMESTAMP NOT NULL,
  latitude DOUBLE PRECISION NOT NULL,
  longitude DOUBLE PRECISION NOT NULL,
  original_id VARCHAR NOT NULL
);
\COPY checkin_node (user_id, date, latitude, longitude, original_id) FROM 'test.csv' WITH DELIMITER E'\t';

ALTER TABLE checkin_node DROP COLUMN original_id;
CREATE INDEX idx_checkin_node_user_id ON checkin_node(user_id);
CREATE INDEX idx_checkin_node_date ON checkin_node(date);

COMMIT;

VACUUM ANALYZE checkin_node;

-- doing window function in a view

DROP VIEW IF EXISTS last_position CASCADE;
CREATE VIEW last_position (user_id, latitude, longitude) AS (
  SELECT user_id, latitude, longitude
  FROM (
         SELECT
           user_id,
           latitude,
           longitude,
rank() OVER (PARTITION BY user_id ORDER BY date DESC, id DESC) AS datepos
         FROM checkin_node
       ) AS tmp_last_position
  WHERE datepos = 1
);

select * from last_position where user_id = 43; -- takes 6621ms

-- similar view but without window function (non-equivalent)

DROP VIEW IF EXISTS without_window_function CASCADE;
CREATE VIEW without_window_function (user_id, latitude, longitude) AS (
  SELECT user_id, latitude, longitude
  FROM (
         SELECT
           user_id,
           latitude,
           longitude,
           (id % 1000)+1 AS datepos --to not use a constant here
         FROM checkin_node
       ) AS tmp_last_position
  WHERE datepos = 1
);
select * from without_window_function where user_id = 43; -- takes 10ms


-- workaround: avoid using views (equivalent)

SELECT user_id, latitude, longitude
FROM (
       SELECT
         user_id,
         latitude,
         longitude,
rank() OVER (PARTITION BY user_id ORDER BY date DESC, id DESC) AS datepos
       FROM checkin_node
       WHERE user_id = 43
     ) AS tmp_last_position
WHERE datepos = 1; -- takes 2 ms


--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance




[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux