All right, after some discussion on StackOverflow, we found out that incorrect query plan is generated due to the fact that there is a LIMIT keyword in the query. I guess Postgresql expects to find appropriate rows faster and that's why it generates a seq scan on the table. If I remove LIMIT 1000 everything is executed in several milliseconds and query plan looks like this:
Hash Join (cost=2662004.85..14948213.44 rows=22661658 width=138) (actual time=0.105..0.105 rows=0 loops=1) Hash Cond: (p."PaymentOrderId" = po."Id") -> Seq Scan on "Payments" p (cost=0.00..5724570.00 rows=350000000 width=18) (actual time=0.018..0.018 rows=1 loops=1) -> Hash (cost=2583365.85..2583365.85 rows=2614480 width=120) (actual time=0.046..0.046 rows=0 loops=1) Buckets: 8192 Batches: 64 Memory Usage: 0kB -> Hash Join (cost=904687.05..2583365.85 rows=2614480 width=120) (actual time=0.046..0.046 rows=0 loops=1) Hash Cond: (po."UserId" = u."Id") -> Seq Scan on "PaymentOrders" po (cost=0.00..654767.00 rows=40000000 width=24) (actual time=0.003..0.003 rows=1 loops=1) -> Hash (cost=850909.04..850909.04 rows=1980881 width=96) (actual time=0.016..0.016 rows=0 loops=1) Buckets: 8192 Batches: 32 Memory Usage: 0kB -> Hash Join (cost=1.20..850909.04 rows=1980881 width=96) (actual time=0.016..0.016 rows=0 loops=1) Hash Cond: (u."RoleId" = r."Id") -> Seq Scan on "Users" u (cost=0.00..718598.20 rows=30000220 width=80) (actual time=0.002..0.002 rows=1 loops=1) -> Hash (cost=1.19..1.19 rows=1 width=16) (actual time=0.009..0.009 rows=0 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 0kB -> Seq Scan on "Roles" r (cost=0.00..1.19 rows=1 width=16) (actual time=0.009..0.009 rows=0 loops=1) Filter: (("Name")::text = 'Moses2333'::text) Rows Removed by Filter: 15 Total runtime: 0.209 ms According to Erwin Brandstetter I also tried pushing the query in a subquery and applying LIMIT there: SELECT * FROM ( SELECT * FROM "Roles" AS r JOIN "Users" AS u ON u."RoleId" = r."Id" JOIN "PaymentOrders" AS po ON po."UserId" = u."Id" JOIN "Payments" AS p ON p."PaymentOrderId" = po."Id" WHERE r."Name" = 'Moses' ) x LIMIT 1000; but this solution also generates incorrect query plan. Any idea how to solve this query without omitting LIMIT keyword? Thanks > Subject: RE: PostgreSQL strange query plan for my query > Date: Fri, 16 Nov 2012 13:55:41 +0100 > From: laurenz.albe@xxxxxxxxxx > To: dato0011@xxxxxxxxxxx; pgsql-performance@xxxxxxxxxxxxxx > > David Popiashvili wrote: > > I have database with few hundred millions of rows. I'm running the > following query: > > > > select * from "Payments" as p > > inner join "PaymentOrders" as po > > on po."Id" = p."PaymentOrderId" > > inner join "Users" as u > > On u."Id" = po."UserId" > > INNER JOIN "Roles" as r > > on u."RoleId" = r."Id" > > Where r."Name" = 'Moses' > > LIMIT 1000 > > When the where clause finds a match in database, I get the result in > several milliseconds, but if I > > modify the query and specify a non-existent r."Name" in where clause, > it takes too much time to > > complete. I guess that PostgreSQL is doing a sequential scan on the > Payments table (which contains the > > most rows), comparing each row one by one. > > Isn't postgresql smart enough to check first if Roles table contains > any row with Name 'Moses'? > > > > Roles table contains only 15 row, while Payments contains ~350 > million. > > > > I'm running PostgreSQL 9.2.1. > > > Here'e explain analyse results: http://explain.depesz.com/s/7e7 > > Can you also show the plan for the good case? > > Yours, > Laurenz Albe |