On 04/22/2011 10:04 AM, John Rouillard wrote:
We have a couple of ssd's 2 x 160GB Intel X25-M MLC SATA acting as the zil (write journal) and are trying to see if it is safe to use for a power fail situation.
Well, the quick answer is "no". I've lost several weekends of my life to recovering information from database stored on those drivers, after they were corrupted in a crash.
The testing method is to copy a bunch of files over NFS to the server with the zil. When the copy is running along, pull the power to the server. The NFS client will stop and if the client got a message that block X was written safely to the zil, it will continue writing with block x+1. After the server comes backup and and the copies resume/finish the files are checksummed. If block X went missing, the checksums will fail and we will have our proof.
Interestingly, you have reinvented parts of the standard script for testing for data loss, diskchecker.pl: http://brad.livejournal.com/2116715.html
You can get a few thousand commits per second using that program, which is enough to fill the drive buffer such that a power pull should sometimes lose something. I don't think you can do a proper test here using NFS; you really need something that is executing fsync calls directly in the same pattern a database server will.
ZFS is more resilient than most filesystem as far as avoiding file corruption in this case. But you should still be able to find some missing transactions that are sitting in the drive cache.
-- Greg Smith 2ndQuadrant US greg@xxxxxxxxxxxxxxx Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us "PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance