Re: BRIN index worse than sequential scan for large search set

Justin Pryzby <pryzby@xxxxxxxxxxxxx> · Fri, 24 Feb 2023 11:19:44 -0600

On Fri, Feb 24, 2023 at 05:40:55PM +0100, Mickael van der Beek wrote:
> Hello everyone,
> 
> I'm playing around with BRIN indexes so as to get a feel for the feature.
> During my tests, I was unable to make BRIN indexes perform better than a
> sequential scan for queries searching for large value sets (20K values in
> the example down below).

> And now let's query 20K random rows from our 20M total rows:

I didn't try your test, but I think *random* is the problem/explanation.

> By default, this query will not use the BRIN index and simply run a 1.5s
> long sequential scan (hitting 700 MB) and a 2.47s hash join for a total
> 8.7s query time:
> https://explain.dalibo.com/plan/46c3191g8a6c1bc7

> If we force the use of the BRIN index using (`SET LOCAL enable_seqscan =
> OFF;`) the same query will now take 50s with 2.5s spent on the bitmap index
> scan (hitting 470 MB of data) and a whopping 42s on the bitmap heap scan
> (hitting 20 GB of data!):
> https://explain.dalibo.com/plan/7f73bg9172a8b226

That means the planner's cost model correctly preferred a seq scan.

> So I had the following two questions:
> 
>    1. Why is the BRIN index systematically worse than a sequential scan,
>    even when the table is x1000 larger than the search set, physically
>    pre-sorted, dense (fillfactor at 100%) and the search rows are themselves
>    sorted?

The table may be dense, but the tuples aren't.  You're asking to return
1/1000th of the tuples, across the entire table.  Suppose there are ~100
tuples per page, and you need to read about every 10th page.  It makes
sense that it's slow to read a large amount of data nonsequentially.
That's why random_page_cost is several times higher than seq_page_cost.

I would expect brin to win if the pages to be accessed were dense rather
than distributed across the whole table.

>    2. Since we only select the "idx" column, why does the BRIN index not
>    simply return the searched value if included in one of it's ranges?
>    Hitting the actual row data stored in the table seems to be unnessary no?

Because it's necessary to check if the tuple is visible to the current
transaction.  It might be from an uncommited/aborted transaction.

-- 
Justin