Hello everyone,
I'm playing around with BRIN indexes so as to get a feel for the feature.
During my tests, I was unable to make BRIN indexes perform better than a sequential scan for queries searching for large value sets (20K values in the example down below).
Creating the table with one single BIGINT required column:
Filling the table with 20 million sorted random BIGINT values:
Now we cluster the table (even though this shouldn't be needed):
Now we create the BRIN index on what should be a perfectly ordered and dense table:
Let's VACUUM the table just to be safe:
And now let's query 20K random rows from our 20M total rows:
By default, this query will not use the BRIN index and simply run a 1.5s long sequential scan (hitting 700 MB) and a 2.47s hash join for a total 8.7s query time:
https://explain.dalibo.com/plan/46c3191g8a6c1bc7
If we force the use of the BRIN index using (`SET LOCAL enable_seqscan = OFF;`) the same query will now take 50s with 2.5s spent on the bitmap index scan (hitting 470 MB of data) and a whopping 42s on the bitmap heap scan (hitting 20 GB of data!):
https://explain.dalibo.com/plan/7f73bg9172a8b226
Since the bitmap heap scan is taking a long time, lets reduce the `pages_per_range` parameter from its 128 default value to 32:
The query now takes 25s, half the time we had before, with 9.7s (up from 2.5s) spent on the bitmap index scan (hitting 2.6GB of data, up from 470 MB) and 10s (down from 42s) on the bitmap heap scan (hitting 4.9GB of data, down from 20 GB):
https://explain.dalibo.com/plan/64fh5h1daaheeab3
We go a step further and lower the `pages_per_range` parameter to 8 (the other extreme).
The query now takes 45s, close-ish to the initial time, with 38.5s (up from 2.5s) spent on the bitmap index scan (hitting 9.8GB of data, up from 470 MB) and 2.6s (down from 42s) on the bitmap heap scan (hitting 1.2GB of data, down from 20 GB):
https://explain.dalibo.com/plan/431fbde7gb19g6g6
So I had the following two questions:
Mickael
I'm playing around with BRIN indexes so as to get a feel for the feature.
During my tests, I was unable to make BRIN indexes perform better than a sequential scan for queries searching for large value sets (20K values in the example down below).
Creating the table with one single BIGINT required column:
CREATE TABLE
test_brin
(
idx BIGINT NOT NULL
)
WITH
(
fillfactor = 100
)
;
Filling the table with 20 million sorted random BIGINT values:
INSERT INTO
test_brin
(
idx
)
SELECT
CAST(FLOOR(RANDOM() * 9223372036854775806) AS BIGINT)
AS idx
FROM
GENERATE_SERIES(1, 20 * 1000 * 1000, 1)
AS g
ORDER BY
idx ASC
;
Now we cluster the table (even though this shouldn't be needed):
CREATE UNIQUE INDEX test_brin_idx_uniq ON test_brin (idx);
CLUSTER test_brin USING test_brin_idx_uniq;
DROP INDEX test_brin_idx_uniq;
Now we create the BRIN index on what should be a perfectly ordered and dense table:
CREATE INDEX
test_brin_idx
ON
test_brin
USING
BRIN
(
idx
)
;
Let's VACUUM the table just to be safe:
VACUUM test_brin;
VACUUM ANALYSE test_brin;
And now let's query 20K random rows from our 20M total rows:
EXPLAIN (
ANALYZE,
VERBOSE,
COSTS,
BUFFERS,
TIMING
)
SELECT
tb.idx
FROM
test_brin
AS tb
WHERE
EXISTS (
SELECT
FROM
(
SELECT
idx
FROM
(
SELECT
-- Just trying to break the optimisation that would recognize "idx" as being an indexed column
(idx + (CEIL(RANDOM()) - 1))::BIGINT
AS idx
FROM
test_brin
ORDER BY
RANDOM()
LIMIT
20000
)
AS t
ORDER BY
idx ASC
)
AS q
WHERE
tb.idx = q.idx
)
;
By default, this query will not use the BRIN index and simply run a 1.5s long sequential scan (hitting 700 MB) and a 2.47s hash join for a total 8.7s query time:
https://explain.dalibo.com/plan/46c3191g8a6c1bc7
If we force the use of the BRIN index using (`SET LOCAL enable_seqscan = OFF;`) the same query will now take 50s with 2.5s spent on the bitmap index scan (hitting 470 MB of data) and a whopping 42s on the bitmap heap scan (hitting 20 GB of data!):
https://explain.dalibo.com/plan/7f73bg9172a8b226
Since the bitmap heap scan is taking a long time, lets reduce the `pages_per_range` parameter from its 128 default value to 32:
CREATE INDEX
test_brin_idx
ON
test_brin
USING
BRIN
(
idx
)
WITH
(
pages_per_range = 32
)
;
The query now takes 25s, half the time we had before, with 9.7s (up from 2.5s) spent on the bitmap index scan (hitting 2.6GB of data, up from 470 MB) and 10s (down from 42s) on the bitmap heap scan (hitting 4.9GB of data, down from 20 GB):
https://explain.dalibo.com/plan/64fh5h1daaheeab3
We go a step further and lower the `pages_per_range` parameter to 8 (the other extreme).
The query now takes 45s, close-ish to the initial time, with 38.5s (up from 2.5s) spent on the bitmap index scan (hitting 9.8GB of data, up from 470 MB) and 2.6s (down from 42s) on the bitmap heap scan (hitting 1.2GB of data, down from 20 GB):
https://explain.dalibo.com/plan/431fbde7gb19g6g6
So I had the following two questions:
- Why is the BRIN index systematically worse than a sequential scan, even when the table is x1000 larger than the search set, physically pre-sorted, dense (fillfactor at 100%) and the search rows are themselves sorted?
This setup would seem to be the ideal conditions for a BRIN index to run well. - Since we only select the "idx" column, why does the BRIN index not simply return the searched value if included in one of it's ranges?
Hitting the actual row data stored in the table seems to be unnessary no?
- PostgreSQL version: v14
- Memory: 32GB
- CPUs: 8
Mickael