Fw: Re: Bitmap scan seem like such a strange choice when "limit 1"

Klaudie Willis <Klaudie.Willis@xxxxxxxxxxxxxx> · Mon, 07 Sep 2020 07:09:42 +0000

> t > '2020-01-01'::timestamp  and  t < '2021-01-01'::timestamp 
>Not at all important, but it seems odd to be exclusive of the start and end both. I would >consider including the start with >=
>Michael Lewis  |  Database Engineer
>Entrata

Michael,  funny I was thinking that myself minutes after posting. Perhaps it is that tiny gap that makes a difference; however changing it to t >= '2020....etc' and perfectly matching the partition range, did not change anything of significance in the explain or runtime. :-|

On that other topic, n_distinct, it is for the moment indeed hardcoded to -0,1. I have tried to reset n_distinct, and run analyze with default_target_statistics = 2000; no dice!
However, the cars_ref in question, is present in the most_common_vals of pg_stats, and according to that frequency array, that value occurs with a frequency of 1,7%.  That seems correct.  

select count(*)
from bigtablet
where cars_ref = 1769854207
and  t >= '2020-01-01'::timestamp  and  t < '2021-01-01'::timestamp; 
--> 2 817 169

I can add that car_ref in general is quite skewed in its distribution, but I don't think that is the issue here.
I think the key hint is that when targeting the partition child table directly, the plan changes.  See below for "proof"

explain (analyze,buffers) 
select *
from bigtable
where car_ref = 1769854207 and t >= '2020-01-01'::timestamp  and  t < '2021-01-01'::timestamp
limit 1

Limit  (cost=24961.76..24962.67 rows=1 width=636) (actual time=1456.315..1456.316 rows=1 loops=1)
  Buffers: shared hit=2377
  ->  Bitmap Heap Scan on bigtable_y2020 bigtable  (cost=24961.76..2640351.94 rows=2874279 width=636) (actual time=1456.313..1456.314 rows=1 loops=1)
        Recheck Cond: (car_ref = 1769854207)
        Filter: ((t >= '2020-01-01 00:00:00'::timestamp without time zone) AND (t < '2021-01-01 00:00:00'::timestamp without time zone))
        Heap Blocks: exact=1
        Buffers: shared hit=2377
        ->  Bitmap Index Scan on bigtable_2020_ref_index  (cost=0.00..24243.19 rows=2874336 width=0) (actual time=721.428..721.428 rows=2817169 loops=1)
              Index Cond: (car_ref = 1769854207)
              Buffers: shared hit=2376
Planning Time: 0.321 ms
Execution Time: 1480.087 ms

explain (analyze,buffers) 
select *
from bigtable_y2020  tt
where car_ref = 1769854207 and t >= '2020-01-01'::timestamp  and  t < '2021-01-01'::timestamp 
limit 1

Limit  (cost=0.57..1.60 rows=1 width=636) (actual time=0.037..0.038 rows=1 loops=1)
  Buffers: shared hit=5
  ->  Index Scan using bigtable_2020_ref_index on bigtable_y2020 tt  (cost=0.57..2967225.58 rows=2874279 width=636) (actual time=0.036..0.036 rows=1 loops=1)
        Index Cond: (car_ref = 1769854207)
        Filter: ((t >= '2020-01-01 00:00:00'::timestamp without time zone) AND (t < '2021-01-01 00:00:00'::timestamp without time zone))
        Buffers: shared hit=5
Planning Time: 0.349 ms
Execution Time: 0.106 ms

best regards 
K