Re: REINDEX : new parameter to preserve current average leaf density as new implicit FILLFACTOR

John Lumby <johnlumby@xxxxxxxxxxx> · Wed, 26 Jun 2019 15:05:04 +0000

Thanks Tom and Peter for those thoughts.

>
> I think you've confused REINDEX with VACUUM.  It seems like a pretty poor
> substitute for that --- it's much more expensive and has worse locking
> requirements.
>

I think the answer is mainly "I wish it were so" ,   but in practice,   even with a reasonably aggressive autovacuum configuration running,  eventually the number of pages and number of dead keys builds up too much.    The assumption (which is the case in our particular workload) is that,  eventually,  unavailable-service maintenance operation must be done,  and REINDEX does (today,  pg-11.x) play a useful role because it addresses various aspects.   And then,  given we find it useful to REINDEX,   we would prefer to avoid the sawtooth / "wave of misery" syndrome.

>
> There is a very recent research paper that discusses the idea of
> varying fillfactor with a view to ameliorating page splits:
> 

Thanks,  I am chewing my way through that.  As you say,  it does address exactly the issues I raised. 
Do you happen to know if their source-code is available somewhere?
( I did see their db is MS SQL Server but it still might provide some portable ideas. )

> 
> I suspect that you might find that the enhancements to B-Tree indexes
> that went into Postgres 12 would help with this workload, especially
> if you notice that this happens with indexes that have a lot of duplicates
>

I had not noticed that,   thanks for pointing it out.  Yes ,  in my workload most of the indexes in question are non-unique and some have very low key card.    I will try out the pg-12 beta when I get a chance.

> 
> For the full background, you might take a look at my pgCon talk:
> https://youtu.be/p5RaATILoiE
>

Is there a pdf or text version?

> 
> Does it seem at all possible that you were affected by either the issue with duplicates,
> 

definitely

> 
> or the issue that is addressed by the "split after new tuple" optimization?
> 

don't know,    yet more research needed.    is there a module or contrib which would tell me?

Thanks again      John Lumby

From: Peter Geoghegan <pg@xxxxxxx>

Sent: June 25, 2019 6:12 PM

To: John Lumby

Cc: pgsql general

Subject: Re: REINDEX : new parameter to preserve current average leaf density as new implicit FILLFACTOR

On Tue, Jun 25, 2019 at 2:45 PM John Lumby <johnlumby@xxxxxxxxxxx> wrote:

> Background :   For some workloads involving high volume of INSERT/UPDATE/DELETE,  It is sometimes beneficial

> to schedule regular REINDEX of high-activity indexes,   so as to improve performance,  or restore performance levels back to what it was earlier,   by removing dead keys etc.      This can result in the average page density of these indexes fluctuating up
 and down in a saw-tooth fashion,  REINDEX causing large increase in density (large drop in total number of pages) and the workload gradually  decreasing density back to some "steady-state".

I suspect that you might find that the enhancements to B-Tree indexes

that went into Postgres 12 would help with this workload, especially

if you notice that this happens with indexes that have a lot of

duplicates. For the full background, you might take a look at my pgCon

talk:

https://youtu.be/p5RaATILoiE

Fair warning: this is a very technical talk.

Does it seem at all possible that you were affected by either the

issue with duplicates, or the issue that is addressed by the "split

after new tuple" optimization? They're both causes of index bloat that

VACUUM cannot usually prevent.

-- 

Peter Geoghegan