BRIN index worse than sequential scan for large search set

Mickael van der Beek <mickael.van.der.beek@xxxxxxxxx> · Fri, 24 Feb 2023 17:40:55 +0100

Hello everyone,

I'm playing around with BRIN indexes so as to get a feel for the feature.
During my tests, I was unable to make BRIN indexes perform better than a sequential scan for queries searching for large value sets (20K values in the example down below).

Creating the table with one single BIGINT required column:
CREATE TABLE
  test_brin
  (
    idx BIGINT NOT NULL
  )
WITH
  (
    fillfactor = 100
  )
;

Filling the table with 20 million sorted random BIGINT values:
INSERT INTO
  test_brin
  (
    idx
  )
SELECT
  CAST(FLOOR(RANDOM() * 9223372036854775806) AS BIGINT)
    AS idx
FROM
  GENERATE_SERIES(1, 20 * 1000 * 1000, 1)
    AS g
ORDER BY
  idx ASC
;

Now we cluster the table (even though this shouldn't be needed):
CREATE UNIQUE INDEX test_brin_idx_uniq ON test_brin (idx);
CLUSTER test_brin USING test_brin_idx_uniq;
DROP INDEX test_brin_idx_uniq;

Now we create the BRIN index on what should be a perfectly ordered and dense table:
CREATE INDEX
  test_brin_idx
ON
  test_brin
USING
  BRIN
  (
    idx
  )
;

Let's VACUUM the table just to be safe:
VACUUM test_brin;
VACUUM ANALYSE test_brin;

And now let's query 20K random rows from our 20M total rows:
EXPLAIN (
  ANALYZE,
  VERBOSE,
  COSTS,
  BUFFERS,
  TIMING
)
SELECT
  tb.idx
FROM
  test_brin
    AS tb
WHERE
  EXISTS (
    SELECT
    FROM
      (
        SELECT
          idx
        FROM
          (
            SELECT
              -- Just trying to break the optimisation that would recognize "idx" as being an indexed column
              (idx + (CEIL(RANDOM()) - 1))::BIGINT
                AS idx
            FROM
              test_brin
            ORDER BY
              RANDOM()
            LIMIT
              20000
          )
            AS t
        ORDER BY
          idx ASC
      )
        AS q
    WHERE
      tb.idx = q.idx
  )
;

By default, this query will not use the BRIN index and simply run a 1.5s long sequential scan (hitting 700 MB) and a 2.47s hash join for a total 8.7s query time:

https://explain.dalibo.com/plan/46c3191g8a6c1bc7

If we force the use of the BRIN index using (`SET LOCAL enable_seqscan = OFF;`) the same query will now take 50s with 2.5s spent on the bitmap index scan (hitting 470 MB of data) and a whopping 42s on the bitmap heap scan (hitting 20 GB of data!):

https://explain.dalibo.com/plan/7f73bg9172a8b226

Since the bitmap heap scan is taking a long time, lets reduce the `pages_per_range` parameter from its 128 default value to 32:
CREATE INDEX
  test_brin_idx
ON
  test_brin
USING
  BRIN
  (
    idx
  )
WITH
  (
    pages_per_range = 32
  )
;

The query now takes 25s, half the time we had before, with 9.7s (up from 2.5s) spent on the bitmap index scan (hitting 2.6GB of data, up from 470 MB) and 10s (down from 42s) on the bitmap heap scan (hitting 4.9GB of data, down from 20 GB):

https://explain.dalibo.com/plan/64fh5h1daaheeab3

We go a step further and lower the `pages_per_range` parameter to 8 (the other extreme).

The query now takes 45s, close-ish to the initial time, with 38.5s (up from 2.5s) spent on the bitmap index scan (hitting 9.8GB of data, up from 470 MB) and 2.6s (down from 42s) on the bitmap heap scan (hitting 1.2GB of data, down from 20 GB):

https://explain.dalibo.com/plan/431fbde7gb19g6g6

So I had the following two questions:
Why is the BRIN index systematically worse than a sequential scan, even when the table is x1000 larger than the search set, physically pre-sorted, dense (fillfactor at 100%) and the search rows are themselves sorted?
This setup would seem to be the ideal conditions for a BRIN index to run well.

Since we only select the "idx" column, why does the BRIN index not simply return the searched value if included in one of it's ranges?
Hitting the actual row data stored in the table seems to be unnessary no?
Here's my test environnement:
  PostgreSQL version: v14
  Memory: 32GB
  CPUs: 8
Thanks a lot in advance,

Mickael