Re: High checkpoint_segments

Venkat Balaji <venkat.balaji@xxxxxxxx> · Wed, 15 Feb 2012 18:53:51 +0530

> Data loss would be an issue when there is a server crash or pg_xlog crash

> etc. That many number of pg_xlog files (1000) would contribute to huge

> data

> loss (data changes not synced to the base are not guaranteed). Of-course,

> this is not related to the current situation.  Normally we calculate the

> checkpoint completion time, IO pressure, CPU load and the threat to the

> data loss when we configure checkpoint_segments.

So you're saying that by using small number of checkpoint segments you

limit the data loss when the WAL gets corrupted/lost? That's a bit like

buying a Maseratti and then not going faster than 10mph because you might

crash at higher speeds ...

No. I am not saying that checkpoint_segments must be lower. I was just trying to explain the IO over-head on putting high (as high as 1000) checkpoint segments.  Lower number of checkpoint segments will lead to more frequent IOs which is not good. Agreed.

The problem here is that the WAL is usually placed on more reliable drives

(compared to the data files) or a RAID1 array and as it's just writing

data sequentially, so the usage pattern is much less likely to cause

data/drive corruption (compared to data files that need to handle a lot of

random I/O, etc.).

Agreed.

So while it possible the WAL might get corrupted, the probability of data

file corruption is much higher. And the corruption might easily happen

silently during a checkpoint, so there won't be any WAL segments no matter

how many of them you keep ...

Agreed. When corruption occurs, it really does not matter how many WAL segments are kept in pg_xlog.
But, at any point of time if PG needs 

And by using low number of checkpoint segments it actually gets worse,

because it means more frequent checkpoints -> more I/O on the drives ->

more wearout of the drives etc.

Completely agreed. As mentioned above. I choose checkpoint_segments and checkpoint_timeout once i observe the checkpoint behavior.

If you need to protect yourself against this, you need to keep a WAL

archive (prefferably on a separate machine) and/or a hot standby for

failover.

WAL archiving is a different situation where-in you need to backup the pg_xlog files by enabling archiving.
I was referring to an exclusive situation, where-in pg_xlogs are not archived and data is not yet been synced to base files (by bgwriter) and the system crashed, then PG would depend on pg_xlog to recover and reach the consistent state, if the pg_xlog is also not available, then there would be a data loss and this depends on how much data is present in pg_xlog files.

Thanks,
VB