The query plan get all columns but I'm using only one column.

Moises Lopez <moylop260@xxxxxxxxxx> · Fri, 24 Apr 2020 16:11:12 -0500

We have production database that has slow queries because of the query get all columns even if I'm using only one column.
The result is slow for tables that there are too much columns
The weird part is that there is environment that I can't reproduce it even if they are using the same postgresql.conf
I didn't find what is the variant/configuration to avoid it
I could reproduce it using the official docker image of postgresql

* Steps to reproduce it

1. Run the following script:
    docker run --name psql1 -d -e POSTGRES_PASSWORD=pwd postgres
    docker exec -it --user=postgres psql1 psql
    # Into docker container
    CREATE DATABASE db;
    \connect db;
    CREATE TABLE link (
    ID serial PRIMARY KEY,
    url VARCHAR (255) NOT NULL,
    name VARCHAR (255) NOT NULL,
    description VARCHAR (255),
    rel VARCHAR (50)
    );
    EXPLAIN (ANALYZE, VERBOSE, BUFFERS) 
        SELECT l1.url
        FROM link l1
        JOIN link l2
          ON l1.url="">
2. See result of the Query Plan:
    QUERY PLAN
    -------------------------------------------------------------------------------------------
    Hash Join  (cost=10.90..21.85 rows=40 width=516) (actual time=0.080..0.081 rows=1 loops=1)
    Output: l1.url
    Hash Cond: ((l1.url)::text = (l2.url)::text)
    Buffers: shared hit=5
    ->  Seq Scan on public.link l1  (cost=0.00..10.40 rows=40 width=516) (actual time=0.010..0.011 rows=1 loops=1)
            Output: l1.id, l1.url, l1.name, l1.description, l1.rel
            Buffers: shared hit=1
    ->  Hash  (cost=10.40..10.40 rows=40 width=516) (actual time=0.021..0.021 rows=1 loops=1)
            Output: l2.url
            Buckets: 1024  Batches: 1  Memory Usage: 9kB
            Buffers: shared hit=1
            ->  Seq Scan on public.link l2  (cost=0.00..10.40 rows=40 width=516) (actual time=0.010..0.011 rows=1 loops=1)
                Output: l2.url
                Buffers: shared hit=1
    Planning Time: 0.564 ms
    Execution Time: 0.142 ms

3. Notice that I'm using only the column "url" for "JOIN" and "SELECT" section,
but the "Output" section is returning all columns.

Is there a manner to avoid returning all columns in order to get a better performance?

Thank you in advance

* PostgreSQL version:

    psql postgres -c "SELECT version()"
        PostgreSQL 12.2 (Debian 12.2-2.pgdg100+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 8.3.0-6) 8.3.0, 64-bit

Changes made to the settings in the postgresql.conf file:  see Server Configuration for a quick way to list them all.
    without changes

Operating system and version:
    cat /etc/os-release
        PRETTY_NAME="Debian GNU/Linux 10 (buster)"

-- 
Moisés López
@moylop260