Re: index fragmentation on insert-only table with non-unique column

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Jun 3, 2016 at 8:54 PM, Justin Pryzby <pryzby@xxxxxxxxxxxxx> wrote:
> As a test, I did SET effective_cache_size='1MB', before running explain, and
> still does:
>
> |        ->  Index Scan using cdrs_huawei_pgwrecord_2016_05_29_recordopeningtime_idx on cdrs_huawei_pgwrecord_2016_05_29  (cost=0.44..1526689.49 rows=8342796 width=355)
> |              Index Cond: ((recordopeningtime >= '2016-05-29 00:00:00'::timestamp without time zone) AND (recordopeningtime < '2016-05-30 00:00:00'::timestamp without time zone))
>
> I Set enable_indexscan=0, and got:
>
> |        ->  Bitmap Heap Scan on cdrs_huawei_pgwrecord_2016_05_29  (cost=168006.10..4087526.04 rows=8342796 width=355)
> |              Recheck Cond: ((recordopeningtime >= '2016-05-29 00:00:00'::timestamp without time zone) AND (recordopeningtime < '2016-05-30 00:00:00'::timestamp without time zone))
> |              ->  Bitmap Index Scan on cdrs_huawei_pgwrecord_2016_05_29_recordopeningtime_idx  (cost=0.00..165920.40 rows=8342796 width=0)
> |                    Index Cond: ((recordopeningtime >= '2016-05-29 00:00:00'::timestamp without time zone) AND (recordopeningtime < '2016-05-30 00:00:00'::timestamp without time zone))
>
> Here's a minimal query which seems to isolate the symptom:
>
> ts=# explain (analyze,buffers) SELECT sum(duration) FROM cdrs_huawei_pgwrecord_2016_05_22 WHERE recordopeningtime>='2016-05-22' AND recordopeningtime<'2016-05-23';
> | Aggregate  (cost=2888731.67..2888731.68 rows=1 width=8) (actual time=388661.892..388661.892 rows=1 loops=1)
> |   Buffers: shared hit=4058501 read=1295147 written=35800
> |   ->  Index Scan using cdrs_huawei_pgwrecord_2016_05_22_recordopeningtime_idx on cdrs_huawei_pgwrecord_2016_05_22  (cost=0.56..2867075.33 rows=8662534 w
> |idth=8) (actual time=0.036..379332.910 rows=8575673 loops=1)
> |         Index Cond: ((recordopeningtime >= '2016-05-22 00:00:00'::timestamp without time zone) AND (recordopeningtime < '2016-05-23 00:00:00'::timestamp
> | without time zone))
> |         Buffers: shared hit=4058501 read=1295147 written=35800
> | Planning time: 0.338 ms
> | Execution time: 388661.947 ms
>
> And here's an older one to avoid cache, with enable_indexscan=0
> |ts=# explain (analyze,buffers)  SELECT sum(duration) FROM cdrs_huawei_pgwrecord_2016_05_08 WHERE recordopeningtime>='2016-05-08' AND recordopeningtime<'2016-05-09';
> | Aggregate  (cost=10006286.58..10006286.59 rows=1 width=8) (actual time=44219.156..44219.156 rows=1 loops=1)
> |   Buffers: shared hit=118 read=1213887 written=50113
> |   ->  Bitmap Heap Scan on cdrs_huawei_pgwrecord_2016_05_08  (cost=85142.24..9985848.96 rows=8175048 width=8) (actual time=708.024..40106.062 rows=8179338 loops=1)
> |         Recheck Cond: ((recordopeningtime >= '2016-05-08 00:00:00'::timestamp without time zone) AND (recordopeningtime < '2016-05-09 00:00:00'::timestamp without time zone))
> |         Rows Removed by Index Recheck: 74909
> |         Heap Blocks: lossy=1213568
> |         Buffers: shared hit=118 read=1213887 written=50113
> |         ->  Bitmap Index Scan on cdrs_huawei_pgwrecord_2016_05_08_recordopeningtime_idx1  (cost=0.00..83098.48 rows=8175048 width=0) (actual time=706.557..706.557 rows=12135680 loops=1)
> |               Index Cond: ((recordopeningtime >= '2016-05-08 00:00:00'::timestamp without time zone) AND (recordopeningtime < '2016-05-09 00:00:00'::timestamp without time zone))
> |               Buffers: shared hit=117 read=320
> | Planning time: 214.786 ms
> | Execution time: 44228.874 ms
> |(12 rows)


Correct me if I'm wrong, but this looks like the planner not
accounting for correlation when using bitmap heap scans.

Checking the source, it really doesn't.

So correlated index scans look extra favourable vs bitmap index scans
because bitmap heap scans consider random page costs sans correlation
effects (even though correlation applies to bitmap heap scans as
well). While that sounds desirable a priori, it seems it's hurting
this case quite badly.

I'm not sure there's any simple way of working around that.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance



[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux