We have production database that has slow queries because of the query get all columns even if I'm using only one column.
The result is slow for tables that there are too much columns
The weird part is that there is environment that I can't reproduce it even if they are using the same postgresql.conf
I didn't find what is the variant/configuration to avoid it
I could reproduce it using the official docker image of postgresql
* Steps to reproduce it
1. Run the following script:
docker run --name psql1 -d -e POSTGRES_PASSWORD=pwd postgres
docker exec -it --user=postgres psql1 psql
# Into docker container
CREATE DATABASE db;
\connect db;
CREATE TABLE link (
ID serial PRIMARY KEY,
url VARCHAR (255) NOT NULL,
name VARCHAR (255) NOT NULL,
description VARCHAR (255),
rel VARCHAR (50)
);
EXPLAIN (ANALYZE, VERBOSE, BUFFERS)
SELECT l1.url
FROM link l1
JOIN link l2
ON l1.url="">
2. See result of the Query Plan:
QUERY PLAN
-------------------------------------------------------------------------------------------
Hash Join (cost=10.90..21.85 rows=40 width=516) (actual time=0.080..0.081 rows=1 loops=1)
Output: l1.url
Hash Cond: ((l1.url)::text = (l2.url)::text)
Buffers: shared hit=5
-> Seq Scan on public.link l1 (cost=0.00..10.40 rows=40 width=516) (actual time=0.010..0.011 rows=1 loops=1)
Output: l1.id, l1.url, l1.name, l1.description, l1.rel
Buffers: shared hit=1
-> Hash (cost=10.40..10.40 rows=40 width=516) (actual time=0.021..0.021 rows=1 loops=1)
Output: l2.url
Buckets: 1024 Batches: 1 Memory Usage: 9kB
Buffers: shared hit=1
-> Seq Scan on public.link l2 (cost=0.00..10.40 rows=40 width=516) (actual time=0.010..0.011 rows=1 loops=1)
Output: l2.url
Buffers: shared hit=1
Planning Time: 0.564 ms
Execution Time: 0.142 ms
3. Notice that I'm using only the column "url" for "JOIN" and "SELECT" section,
but the "Output" section is returning all columns.
* Steps to reproduce it
1. Run the following script:
docker run --name psql1 -d -e POSTGRES_PASSWORD=pwd postgres
docker exec -it --user=postgres psql1 psql
# Into docker container
CREATE DATABASE db;
\connect db;
CREATE TABLE link (
ID serial PRIMARY KEY,
url VARCHAR (255) NOT NULL,
name VARCHAR (255) NOT NULL,
description VARCHAR (255),
rel VARCHAR (50)
);
EXPLAIN (ANALYZE, VERBOSE, BUFFERS)
SELECT l1.url
FROM link l1
JOIN link l2
ON l1.url="">
2. See result of the Query Plan:
QUERY PLAN
-------------------------------------------------------------------------------------------
Hash Join (cost=10.90..21.85 rows=40 width=516) (actual time=0.080..0.081 rows=1 loops=1)
Output: l1.url
Hash Cond: ((l1.url)::text = (l2.url)::text)
Buffers: shared hit=5
-> Seq Scan on public.link l1 (cost=0.00..10.40 rows=40 width=516) (actual time=0.010..0.011 rows=1 loops=1)
Output: l1.id, l1.url, l1.name, l1.description, l1.rel
Buffers: shared hit=1
-> Hash (cost=10.40..10.40 rows=40 width=516) (actual time=0.021..0.021 rows=1 loops=1)
Output: l2.url
Buckets: 1024 Batches: 1 Memory Usage: 9kB
Buffers: shared hit=1
-> Seq Scan on public.link l2 (cost=0.00..10.40 rows=40 width=516) (actual time=0.010..0.011 rows=1 loops=1)
Output: l2.url
Buffers: shared hit=1
Planning Time: 0.564 ms
Execution Time: 0.142 ms
3. Notice that I'm using only the column "url" for "JOIN" and "SELECT" section,
but the "Output" section is returning all columns.
Is there a manner to avoid returning all columns in order to get a better performance?
Thank you in advance
* PostgreSQL version:
psql postgres -c "SELECT version()"
PostgreSQL 12.2 (Debian 12.2-2.pgdg100+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 8.3.0-6) 8.3.0, 64-bit
Changes made to the settings in the postgresql.conf file: see Server Configuration for a quick way to list them all.
without changes
Operating system and version:
cat /etc/os-release
PRETTY_NAME="Debian GNU/Linux 10 (buster)"
Moisés López
@moylop260