On Wed, Oct 24, 2012 at 8:04 AM, Chris Angelico <rosuav@xxxxxxxxx> wrote: > On Tue, Oct 23, 2012 at 9:51 AM, Scott Marlowe <scott.marlowe@xxxxxxxxx> wrote: >> On Mon, Oct 22, 2012 at 7:17 AM, Chris Angelico <rosuav@xxxxxxxxx> wrote: >>> After reading the comments last week about SSDs, I did some testing of >>> the ones we have at work - each of my test-boxes (three with SSDs, one >>> with HDD) subjected to multiple stand-alone plug-pull tests, using >>> pgbench to provide load. So far, there've been no instances of >>> PostgreSQL data corruption, but diskchecker.pl reported huge numbers >>> of errors. >> >> Try starting pgbench, and then halfway through the timeout for a >> checkpoint timeout issue a checkpoint and WHILE the checkpoint is >> still running THEN pull the plug. >> >> Then after bringing the server up (assuming pg starts up) see if >> pg_dump generates any errors. > > Thanks for the tip. I've been flat-out at work these past few days and > haven't gotten around to testing in the middle of a checkpoint, but I > have done something that might also be of interest. It's inspired by a > combination of diskchecker and pgbench; a harness that puts the > database under load and retains a record of what's been done. > > In brief: Create a table with N (eg 100) rows, then spin as fast as > possible, incrementing a counter against one random row and also > incrementing the "Total" counter. When the database goes down, wait > for it to come up again; when it does, check against the local copy of > the counters and report any discrepancies. > > The code's written in Pike, using the same database connection logic > that we use in our actual application (well, some of our code is C++ > and some is PHP, so this corresponds to one part of our app), so this > is roughly representative of real usage. > > It's about a page or two of code: http://pastebin.com/UNTj642Y Very cool. Nice little project. > Currently, all the key parameters (database connection info (which has > been censored for the pastebin version), pool size, thread count, etc) > are just variables visible in the script, simpler than parsing > command-line arguments. > > Is this a useful and plausible testing methodology? It's definitely > showed up some failures. On a hard-disk, all is well as long as the > write-back cache is disabled; on the SSDs, I can't make them reliable. Yes it seems to be quite a good idea actually. > Is a single table enough to test for corruption with? If it fails, definitely, if it passes maybe. -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general