On Wed, Feb 15, 2012 at 4:12 PM, Andres Freund <andres@xxxxxxxxxxx> wrote:
On Wednesday, February 15, 2012 10:38:23 AM Venkat Balaji wrote:
> On Wed, Feb 15, 2012 at 12:21 PM, Scott Marlowe
<scott.marlowe@xxxxxxxxx>wrote:
> > On Tue, Feb 14, 2012 at 10:57 PM, Venkat Balaji <venkat.balaji@xxxxxxxx>
> > > all of these 1000 files get filled up in less than 5 mins, there areI think you might be misunderstanding something. A high number of
> > > chances that system will slow down due to high IO and CPU.
> > As far as I know there is no data loss issue with a lot of checkpoint
> > segments.
> Data loss would be an issue when there is a server crash or pg_xlog crash
> etc. That many number of pg_xlog files (1000) would contribute to huge data
> loss (data changes not synced to the base are not guaranteed). Of-course,
> this is not related to the current situation. Normally we calculate the
> checkpoint completion time, IO pressure, CPU load and the threat to the
> data loss when we configure checkpoint_segments.
checkpoint_segments can lead to slower recovery - all those changes need to be
reapplied - but it won't lead to lost data. The data inside the wal will be
fsynced at appropriate times (commit; background writer; too much written).
Recovery would take time because all the changes in WAL files of pg_xlog (which is high) must be replayed to reach consistent state. When disaster strikes and if pg_xlogs are not available and data in WAL is not fsynced yet, then recovery is not possible and data loss will be huge. It also depends on how much data is not fsynced.
Thanks,
VB