Re: Weird XFS WAL problem

Greg Smith <greg@xxxxxxxxxxxxxxx> · Sat, 05 Jun 2010 18:50:27 -0400

Kevin Grittner wrote:
I don't know at the protocol level; I just know that write barriers
do *something* which causes our controllers to wait for actual disk
platter persistence, while fsync does not

It's in the docs now:  
http://www.postgresql.org/docs/9.0/static/wal-reliability.html

FLUSH CACHE EXT is the ATAPI-6 call that filesystems use to enforce 
barriers on that type of drive.  Here's what the relevant portion of the 
ATAPI spec says:

"This command is used by the host to request the device to flush the 
write cache. If there is data in the write
cache, that data shall be written to the media.The BSY bit shall remain 
set to one until all data has been
successfully written or an error occurs."

SAS systems have a similar call named SYNCHRONIZE CACHE.

The improvement I actually expect to arrive here first is a reliable 
implementation of O_SYNC/O_DSYNC writes.  Both SAS and SATA drives that 
capable of doing Native Command Queueing support a write type called 
"Force Unit Access", which is essentially just like a direct write that 
cannot be cached.  When we get more kernels with reliable sync writing 
that maps under the hood to FUA, and can change wal_sync_method to use 
them, the need to constantly call fsync for every write to the WAL will 
go away.  Then the "blow out the RAID cache when barriers are on" 
behavior will only show up during checkpoint fsyncs, which will make 
things a lot better (albeit still not ideal).

--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
greg@xxxxxxxxxxxxxxx   www.2ndQuadrant.us

--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance