Re: High checkpoint_segments

Venkat Balaji <venkat.balaji@xxxxxxxx> · Wed, 15 Feb 2012 15:08:23 +0530

On Wed, Feb 15, 2012 at 12:21 PM, Scott Marlowe <scott.marlowe@xxxxxxxxx> wrote:

On Tue, Feb 14, 2012 at 10:57 PM, Venkat Balaji <venkat.balaji@xxxxxxxx> wrote:

>

> On Wed, Feb 15, 2012 at 1:35 AM, Jay Levitt <jay.levitt@xxxxxxxxx> wrote:

>>

>> We need to do a few bulk updates as Rails migrations.  We're a typical

>> read-mostly web site, so at the moment, our checkpoint settings and WAL are

>> all default (3 segments, 5 min, 16MB), and updating a million rows takes 10

>> minutes due to all the checkpointing.

>>

>> We have no replication or hot standbys.  As a consumer-web startup, with

>> no SLA, and not a huge database, and if we ever do have to recover from

>> downtime it's ok if it takes longer.. is there a reason NOT to always run

>> with something like checkpoint_segments = 1000, as long as I leave the

>> timeout at 5m?

>

>

> Still checkpoints keep occurring every 5 mins. Anyways

> checkpoint_segments=1000 is huge, this implies you are talking about

> 16MB * 1000 = 16000MB worth pg_xlog data, which is not advisable from I/O

> perspective and data loss perspective. Even in the most unimaginable case if

> all of these 1000 files get filled up in less than 5 mins, there are chances

> that system will slow down due to high IO and CPU.

As far as I know there is no data loss issue with a lot of checkpoint segments.

Data loss would be an issue when there is a server crash or pg_xlog crash etc. That many number of pg_xlog files (1000) would contribute to huge data loss (data changes not synced to the base are not guaranteed). Of-course, this is not related to the current situation.  Normally we calculate the checkpoint completion time, IO pressure, CPU load and the threat to the data loss when we configure checkpoint_segments.