Thanks for the prompt replies so far, I have done some more investigation to be able to clearly answer some of the question. The original shared-buffers was 8G and I have done another run on Friday using this old value instead of my more recent 1G limit. There was no noticeable improvement. I also installed the extension pg_buffercache and following some articles such as: using: SELECT pg_size_pretty(count(*) * 8192) as ideal_shared_buffers FROM pg_class c INNER JOIN pg_buffercache b ON b.relfilenode = c.relfilenode INNER JOIN pg_database d ON (b.reldatabase = d.oid AND d.datname = current_database()) WHERE usagecount >= 3; My system under load is using just over 500M of the shared_buffer at usage count >= 3. Our system is very write heavy, with all of the big tables written to but not read from (at least during the load test run). Although our db will grow (under load) to 100G in a few hours and keep growing, the data in shared_buffers - according to my observations above - seems low. We have the WAL on a different disk from the main tables. We have the following representative TPS for the disks (from SAR) when under our load test: 06:34:01 PM DEV tps rd_sec/s wr_sec/s avgrq-sz avgqu-sz await svctm %util 06:35:01 PM dev8-0 176.15 25.89 8773.98 49.96 0.83 4.73 3.79 66.85 06:35:01 PM dev8-1 174.74 0.13 8746.49 50.06 0.81 4.64 3.82 66.81 06:35:01 PM dev8-2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 06:35:01 PM dev8-5 1.41 25.76 27.49 37.74 0.02 14.96 2.31 0.33 06:35:01 PM dev8-16 15.02 0.00 3994.82 265.95 1.27 84.88 2.76 4.14 06:35:01 PM dev8-17 15.02 0.00 3994.82 265.95 1.27 84.88 2.76 4.14 dev8-0 to dev8-5 are the represent the disk where the WAL is. dev8-16 to dev8-17 are where the main postgres data directory is located. We have run bonnie against these WAL disks and the SAR log for that show the following (examples for heavy write and a heavy read and write): heavy write 01:28:01 PM DEV tps rd_sec/s wr_sec/s avgrq-sz avgqu-sz await svctm %util 01:29:01 PM dev8-0 371.85 12.11 377771.45 1015.97 122.93 330.46 2.69 100.00 01:29:01 PM dev8-1 371.85 12.11 377771.45 1015.97 122.93 330.46 2.69 100.00 01:29:01 PM dev8-2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 01:29:01 PM dev8-5 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 01:29:01 PM dev8-16 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 01:29:01 PM dev8-17 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 heavy read/write 12:56:02 PM DEV tps rd_sec/s wr_sec/s avgrq-sz avgqu-sz await svctm %util 12:57:01 PM dev8-0 456.32 355622.67 109942.01 1020.26 58.34 126.70 2.17 99.00 12:57:01 PM dev8-1 456.32 355622.67 109942.01 1020.26 58.34 126.70 2.17 99.00 12:57:01 PM dev8-2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 12:57:01 PM dev8-5 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 12:57:01 PM dev8-16 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 12:57:01 PM dev8-17 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 To me this shows that we are not IO limited at least it is not obvious to me how we can be. I have also, previously, run our load test (purely a sanity check) where I had the WAL written to RAM disk, just to be sure that the IO was not a bottleneck, and we still had the same problem with waiting on exclusive locks. As far as the checkpoint goes, it does happen every 5 minutes and takes about 4.5 mins which corresponds to the 0.9 checkpoint_completion_target we have set.
|