"# Brad's el-ghetto do-our-storage-stacks-lie?-script" I like it already :)
I may play around with that. Looks interesting. For everyone else, here's a post describing the use of diskchecker: http://brad.livejournal.com/2116715.html
I experimented with sysbench today, which was somewhat enlightening and it clearly shows the impact that fsync/fdatasync has on the file system performance. It's pretty obvious that fsync is writing out to disk simply based on the throughput of each test.
Using pgbench is a good idea, as I can throw a high transaction rate at the database and take a snapshot during the test. So far, executing pg_dumpall seems to be fairly reliable for finding the corrupt objects after my initial data load, but unfortunately much of the corruption has been with indexes which pgdump will not expose.
Thanks for the input,
T
On Tue, Aug 7, 2012 at 6:11 PM, Craig Ringer <ringerc@xxxxxxxxxxxxx> wrote:
On 08/08/2012 06:23 AM, Terry Schmitt wrote:Try diskchecker.pl
Anyone have a solid method to test if fdatasync is working correctly or
thoughts on troubleshooting this?
https://gist.github.com/3177656
The other obvious step is that you've changed three things, so start isolation testing.
- Test Postgres Plus Advanced Server 8.4, which you knew worked, on your new file system and OS.
- Test PP9.1 on your new OS but with ext3, which you knew worked
- Test PP9.1 on your new OS but with ext4, which should work if ext3 did
- Test PP9.1 on a copy of your *old* OS with the old file system setup.
- Test mainline PostgreSQL 9.1 on your new setup to see if it's PP specific.
Since each test sounds moderately time consuming, you'll probably need to find a way to automate. I'd first see if I could reproduce the problem when running PgBench against the same setup that's currently failing, and if that reproduces the fault you can use PgBench with the other tests.
--
Craig Ringer