Re: turning fsync off for WAL

"Ram Ravichandran" <ramkaka@xxxxxxxxx> · Tue, 3 Jun 2008 00:04:19 -0400

Wow, this is a fascinating situation. Are you sure the fsyncs are the only

thing to worry about though? Postgres will call write(2) many times even if

you disabled fsync entirely. Surely the kernel and filesystem will eventually

send some of them through even if no fsyncs arrive?

Given that I am only worried about WAL being persistent, are these other issues 
still pertinent? I am sorry I am such a newbie.

Is it only fsyncs on the write-ahead-log that matter? Or on the data as well?

Checkpoints fsync the data files. The logs are fsynced on every commit and

also whenever a buffer has to be flushed if the logs for the last changes in

that buffer haven't been synced yet.

I was talking only of WAL. Basically, I am just trying to make sure if my EC2 instance goes down, 
I will be able to recover by replaying my write-ahead-logs. I am assuming checkpoints are for the 
actual tables on the disk (And not for logging / backup). Am I correct?

There actually is an option in Postgres to not call fsync. However your fear

is justified. If your file system can flush buffers to disk in a different

order than they were written (and most can) then it's possible for a database

with fsync off to become corrupted. Typical examples would be things like

records missing index pointers (or worse, index pointers to wrong records), or

duplicate or missing records (consider if an update is only partly written).

This is only an issue in the event of either a kernel crash or power failure

(whatever that means for a virtual machine...). In which case the only safe

course of action is to restore from backup. It's possible that in the context

of Amazon these would be rare enough events and restoring from backups easy

enough that that might be worth considering?

However a safer and more interesting option with Postgres 8.3 would be to

disable "synchronous_commit" and set a very large wal_writer_delay.

Effectively this would do the same thing, disabling fsync for every

transaction, but not risk the data integrity.

The default wal_writer_delay is 200ms meaning 5 fsyncs per second but you

could raise that substantially to get fewer fsyncs, possibly into the range of

minutes. If you raise it *too* far then you'll start observing fsyncs due to

processing being forced to flush dirty buffers before their changes have been

logged and synced. The only way to raise that would be to increase the

shared_buffers which would have complex effects.

This seems like a much better idea. So, I should 
a) disable synchronous_commit 
b) set wal_writer_delay to say 1 minute (and leave fsync on)
c) symlink pg_xlog to the PersistentFS on S3.

If there is a crash, I should be able to restore entirely from the WAL logs. Although, doesn't 
this have the same problem as disabling the fsyncs? 

BTW, if the wal_writer_delay is too long, then the fsyncs to flush dirty buffers would also fsync the
WAL right? Is that bad (as far as data integrity), or is it just that the fsyncs would be more frequent?

Thanks everyone for all the help.

Ram