On Sun, Feb 28, 2016 at 2:06 AM, David Rowley <david.rowley@xxxxxxxxxxxxxxx> wrote:
postgres=# CREATE INDEX a_id_nonce_idx ON a (nonce, id);
CREATE INDEX
postgres=# explain (analyze,buffers) SELECT b.* FROM b JOIN a ON b.a_id = a.id WHERE a.nonce = 64 ORDER BY b.id ASC;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------
Sort (cost=3376.57..3376.92 rows=140 width=16) (actual time=144.340..160.875 rows=83427 loops=1)
Sort Key: b.id
Sort Method: external merge Disk: 2120kB
Buffers: shared hit=368 read=522, temp read=266 written=266
-> Merge Join (cost=69.50..3371.58 rows=140 width=16) (actual time=0.056..88.409 rows=83427 loops=1)
Merge Cond: (b.a_id = a.id)
Buffers: shared hit=368 read=522
-> Index Scan using a_idx on b (cost=0.29..2724.58 rows=83427 width=16) (actual time=0.009..23.834 rows=83427 loops=1)
Buffers: shared hit=171 read=518
-> Index Only Scan using a_id_nonce_idx on a (cost=0.42..2450.58 rows=820 width=8) (actual time=0.041..20.776 rows=83658 loops=1)
Index Cond: (nonce = 64)
Heap Fetches: 83658
Buffers: shared hit=197 read=4
Planning time: 0.241 ms
Execution time: 172.346 ms
HEAD [1], with patch:
postgres=# explain (analyze,buffers) SELECT b.* FROM b JOIN a ON b.a_id = a.id WHERE a.nonce = 64 ORDER BY b.id ASC;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------
Sort (cost=7320.39..7320.75 rows=144 width=16) (actual time=162.199..181.497 rows=83427 loops=1)
Sort Key: b.id
Sort Method: external merge Disk: 2120kB
Buffers: shared hit=85780 read=472, temp read=266 written=266
-> Merge Join (cost=808.17..7315.23 rows=144 width=16) (actual time=9.719..101.864 rows=83427 loops=1)
Merge Cond: (b.a_id = a.id)
Buffers: shared hit=85780 read=472
-> Index Scan using a_idx on b (cost=0.29..2721.49 rows=83427 width=16) (actual time=0.059..26.346 rows=83427 loops=1)
Buffers: shared hit=460 read=229
-> Index Scan using a_pkey on a (cost=0.42..24394.88 rows=846 width=8) (actual time=9.651..41.231 rows=237 loops=1)
Filter: (nonce = 64)
Rows Removed by Filter: 87939
Buffers: shared hit=85320 read=243
Planning time: 0.289 ms
Execution time: 195.071 ms
On 27 February 2016 at 11:07, James Parks <james.parks@xxxxxxxxxx> wrote:
>
> CREATE TABLE a (id bigint primary key, nonce bigint);
> CREATE TABLE b (id bigint primary key, a_id bigint not null);
> CREATE INDEX a_idx ON b (a_id);
>
> The query:
>
> SELECT b.* FROM b JOIN a ON b.a_id = a.id WHERE a.nonce = ? ORDER BY b.id
> ASC;
>
> (skip down to [1] and [2] to see the query performance)
>
> What I know:
>
> If you force the query planner to use a merge join on the above query, it
> takes 10+ minutes to complete using the data as per below. If you force the
> query planner to use a hash join on the same data, it takes ~200
> milliseconds.
I believe I know what is going on here, but can you please test;
SELECT b.* FROM b WHERE EXISTS (SELECT 1 FROM a ON b.a_id = a.id AND
a.nonce = ?) ORDER BY b.id ASC;
using the merge join plan.
Here's the query plan for that query (slight modifications to get it to run):
postgres=# explain (analyze,buffers) SELECT b.* FROM b WHERE EXISTS (SELECT 1 FROM a WHERE b.a_id = a.id AND a.nonce = 64) ORDER BY b.id ASC;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------
Sort (cost=16639.19..16855.62 rows=86572 width=16) (actual time=145.117..173.298 rows=83427 loops=1)
Sort Key: b.id
Sort Method: external merge Disk: 2112kB
Buffers: shared hit=86193 read=881, temp read=269 written=269
I/O Timings: read=3.199
-> Merge Semi Join (cost=795.82..8059.09 rows=86572 width=16) (actual time=6.680..91.862 rows=83427 loops=1)
Merge Cond: (b.a_id = a.id)
Buffers: shared hit=86193 read=881
I/O Timings: read=3.199
-> Index Scan using a_idx on b (cost=0.00..3036.70 rows=86572 width=16) (actual time=0.005..25.193 rows=83427 loops=1)
Buffers: shared hit=1064 read=374
I/O Timings: read=1.549
-> Index Scan using a_pkey on a (cost=0.00..26259.85 rows=891 width=8) (actual time=6.663..35.177 rows=237 loops=1)
Filter: (nonce = 64)
Rows Removed by Filter: 87939
Buffers: shared hit=85129 read=507
I/O Timings: read=1.650
Total runtime: 186.825 ms
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------
Sort (cost=16639.19..16855.62 rows=86572 width=16) (actual time=145.117..173.298 rows=83427 loops=1)
Sort Key: b.id
Sort Method: external merge Disk: 2112kB
Buffers: shared hit=86193 read=881, temp read=269 written=269
I/O Timings: read=3.199
-> Merge Semi Join (cost=795.82..8059.09 rows=86572 width=16) (actual time=6.680..91.862 rows=83427 loops=1)
Merge Cond: (b.a_id = a.id)
Buffers: shared hit=86193 read=881
I/O Timings: read=3.199
-> Index Scan using a_idx on b (cost=0.00..3036.70 rows=86572 width=16) (actual time=0.005..25.193 rows=83427 loops=1)
Buffers: shared hit=1064 read=374
I/O Timings: read=1.549
-> Index Scan using a_pkey on a (cost=0.00..26259.85 rows=891 width=8) (actual time=6.663..35.177 rows=237 loops=1)
Filter: (nonce = 64)
Rows Removed by Filter: 87939
Buffers: shared hit=85129 read=507
I/O Timings: read=1.650
Total runtime: 186.825 ms
... so, yes, it does indeed get a lot faster
If this performs much better then the problem is due to the merge join
mark/restore causing the join to have to transition through many
tuples which don't match the a.nonce = ? predicate. The mark and
restore is not required for the rewritten query, as this use a semi
join rather than a regular inner join. With the semi join the executor
knows that it's only meant to be matching a single tuple in "a", so
once the first match is found it can move to the next row in the outer
relation without having to restore the scan back to where it started
matching that inner row again.
If I'm right, to get around the problem you could; create index on a
(nonce, id);
postgres=# CREATE INDEX a_id_nonce_idx ON a (nonce, id);
CREATE INDEX
postgres=# explain (analyze,buffers) SELECT b.* FROM b JOIN a ON b.a_id = a.id WHERE a.nonce = 64 ORDER BY b.id ASC;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------
Sort (cost=3376.57..3376.92 rows=140 width=16) (actual time=144.340..160.875 rows=83427 loops=1)
Sort Key: b.id
Sort Method: external merge Disk: 2120kB
Buffers: shared hit=368 read=522, temp read=266 written=266
-> Merge Join (cost=69.50..3371.58 rows=140 width=16) (actual time=0.056..88.409 rows=83427 loops=1)
Merge Cond: (b.a_id = a.id)
Buffers: shared hit=368 read=522
-> Index Scan using a_idx on b (cost=0.29..2724.58 rows=83427 width=16) (actual time=0.009..23.834 rows=83427 loops=1)
Buffers: shared hit=171 read=518
-> Index Only Scan using a_id_nonce_idx on a (cost=0.42..2450.58 rows=820 width=8) (actual time=0.041..20.776 rows=83658 loops=1)
Index Cond: (nonce = 64)
Heap Fetches: 83658
Buffers: shared hit=197 read=4
Planning time: 0.241 ms
Execution time: 172.346 ms
Looks pretty fast to me. That being said, the number of rows returned by the Index Only Scan seems a bit high, as compared to the results below, so I added your patch below and got [2].
If such an index is out of the question then a patch has been
submitted for review which should fix this problem in (hopefully)
either 9.6 or 9.7
https://commitfest.postgresql.org/9/129/
If you have a test environment handy, it would be nice if you could
test the patch on the current git head to see if this fixes your
problem. The findings would be quite interesting for me. Please note
this patch is for test environments only at this stage, not for
production use.
Can confirm that your patch there seems to improve performance.
HEAD [1], with patch:
postgres=# explain (analyze,buffers) SELECT b.* FROM b JOIN a ON b.a_id = a.id WHERE a.nonce = 64 ORDER BY b.id ASC;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------
Sort (cost=7320.39..7320.75 rows=144 width=16) (actual time=162.199..181.497 rows=83427 loops=1)
Sort Key: b.id
Sort Method: external merge Disk: 2120kB
Buffers: shared hit=85780 read=472, temp read=266 written=266
-> Merge Join (cost=808.17..7315.23 rows=144 width=16) (actual time=9.719..101.864 rows=83427 loops=1)
Merge Cond: (b.a_id = a.id)
Buffers: shared hit=85780 read=472
-> Index Scan using a_idx on b (cost=0.29..2721.49 rows=83427 width=16) (actual time=0.059..26.346 rows=83427 loops=1)
Buffers: shared hit=460 read=229
-> Index Scan using a_pkey on a (cost=0.42..24394.88 rows=846 width=8) (actual time=9.651..41.231 rows=237 loops=1)
Filter: (nonce = 64)
Rows Removed by Filter: 87939
Buffers: shared hit=85320 read=243
Planning time: 0.289 ms
Execution time: 195.071 ms
HEAD [1], without patch:
postgres=# explain (analyze,buffers) SELECT b.* FROM b JOIN a ON b.a_id = a.id WHERE a.nonce = 64 ORDER BY b.id ASC;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------
Sort (cost=7285.48..7285.83 rows=140 width=16) (actual time=710073.059..710089.924 rows=83427 loops=1)
Sort Key: b.id
Sort Method: external merge Disk: 2120kB
Buffers: shared hit=2149282200 read=243, temp read=266 written=266
-> Merge Join (cost=685.28..7280.49 rows=140 width=16) (actual time=10.022..709878.946 rows=83427 loops=1)
Merge Cond: (b.a_id = a.id)
Buffers: shared hit=2149282200 read=243
-> Index Scan using a_idx on b (cost=0.29..2724.58 rows=83427 width=16) (actual time=0.013..59.103 rows=83427 loops=1)
Buffers: shared hit=689
-> Index Scan using a_pkey on a (cost=0.42..24404.69 rows=820 width=8) (actual time=9.998..709595.905 rows=83658 loops=1)
Filter: (nonce = 64)
Rows Removed by Filter: 2201063696
Buffers: shared hit=2149281511 read=243
Planning time: 0.297 ms
Execution time: 710101.931 ms
postgres=# explain (analyze,buffers) SELECT b.* FROM b JOIN a ON b.a_id = a.id WHERE a.nonce = 64 ORDER BY b.id ASC;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------
Sort (cost=7285.48..7285.83 rows=140 width=16) (actual time=710073.059..710089.924 rows=83427 loops=1)
Sort Key: b.id
Sort Method: external merge Disk: 2120kB
Buffers: shared hit=2149282200 read=243, temp read=266 written=266
-> Merge Join (cost=685.28..7280.49 rows=140 width=16) (actual time=10.022..709878.946 rows=83427 loops=1)
Merge Cond: (b.a_id = a.id)
Buffers: shared hit=2149282200 read=243
-> Index Scan using a_idx on b (cost=0.29..2724.58 rows=83427 width=16) (actual time=0.013..59.103 rows=83427 loops=1)
Buffers: shared hit=689
-> Index Scan using a_pkey on a (cost=0.42..24404.69 rows=820 width=8) (actual time=9.998..709595.905 rows=83658 loops=1)
Filter: (nonce = 64)
Rows Removed by Filter: 2201063696
Buffers: shared hit=2149281511 read=243
Planning time: 0.297 ms
Execution time: 710101.931 ms
Thank you for your response! This allows us to understand the situations in which we can run into trouble, and what we can do in most, if not all cases, to resolve.
Regards,
James
[1] HEAD is "a892234 Change the format of the VM fork to add a second bit per page." as found on the Github mirror of postgres
[2]
postgres=# explain (analyze,buffers) SELECT b.* FROM b JOIN a ON b.a_id = a.id WHERE a.nonce = 64 ORDER BY b.id ASC;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------
Sort (cost=11000.03..11001.07 rows=417 width=16) (actual time=109.739..129.031 rows=83427 loops=1)
Sort Key: b.id
Sort Method: external merge Disk: 2120kB
Buffers: shared hit=886, temp read=266 written=266
-> Merge Join (cost=0.71..10981.88 rows=417 width=16) (actual time=0.050..55.701 rows=83427 loops=1)
Merge Cond: (b.a_id = a.id)
Buffers: shared hit=886
-> Index Scan using a_idx on b (cost=0.29..3975.70 rows=83427 width=16) (actual time=0.017..23.647 rows=83427 loops=1)
Buffers: shared hit=689
-> Index Only Scan using a_id_nonce_idx on a (cost=0.42..6787.31 rows=2451 width=8) (actual time=0.028..0.172 rows=237 loops=1)
Index Cond: (nonce = 64)
Heap Fetches: 237
Buffers: shared hit=197
Planning time: 0.153 ms
Execution time: 142.994 ms
[2]
postgres=# explain (analyze,buffers) SELECT b.* FROM b JOIN a ON b.a_id = a.id WHERE a.nonce = 64 ORDER BY b.id ASC;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------
Sort (cost=11000.03..11001.07 rows=417 width=16) (actual time=109.739..129.031 rows=83427 loops=1)
Sort Key: b.id
Sort Method: external merge Disk: 2120kB
Buffers: shared hit=886, temp read=266 written=266
-> Merge Join (cost=0.71..10981.88 rows=417 width=16) (actual time=0.050..55.701 rows=83427 loops=1)
Merge Cond: (b.a_id = a.id)
Buffers: shared hit=886
-> Index Scan using a_idx on b (cost=0.29..3975.70 rows=83427 width=16) (actual time=0.017..23.647 rows=83427 loops=1)
Buffers: shared hit=689
-> Index Only Scan using a_id_nonce_idx on a (cost=0.42..6787.31 rows=2451 width=8) (actual time=0.028..0.172 rows=237 loops=1)
Index Cond: (nonce = 64)
Heap Fetches: 237
Buffers: shared hit=197
Planning time: 0.153 ms
Execution time: 142.994 ms