SSI slows down over time

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi all,

Disclaimer: this question probably belongs on the hackers list, but the instructions say you have to try somewhere else first... toss-up between this list and a bug report; list seemed more appropriate as a starting point. Happy to file a bug if that's more appropriate, though.

This is with pgsql-9.3.4, x86_64-linux, home-built with `./configure --prefix=...' and gcc-4.7.
TPC-C courtesy of oltpbenchmark.com. 12WH TPC-C, 24 clients.

I get a strange behavior across repeated runs: each 100-second run is a bit slower than the one preceding it, when run with SSI (SERIALIZABLE). Switching to SI (REPEATABLE_READ) removes the problem, so it's apparently not due to the database growing. The database is completely shut down (pg_ctl stop) between runs, but the data lives in tmpfs, so there's no I/O problem here. 64GB RAM, so no paging, either.

Note that this slowdown is in addition to the 30% performance from using SSI on my 24-core machine. I understand that the latter is a known bottleneck; my question is why the bottleneck should get worse over time:

With SI, I get ~4.4ktps, consistently.
With SSI, I get 3.9, 3.8, 3.4. 3.3, 3.1, 2.9, ...

So the question: what should I look for to diagnose/triage this problem? I'm willing to do some legwork, but have no idea where to go next.

I've tried linux perf, but all it says is that lots of time is going to LWLock (but callgraph tracing doesn't work in my not-bleeding-edge kernel). Looking through the logs, the abort rates due to SSI aren't changing in any obvious way. I've been hacking on SSI for over a month now as part of a research project, and am fairly familiar with predicate.c, but I don't see any obvious reason this behavior should arise (in particular, SLRU storage seems to be re-initialized every time the postmaster restarts, so there shouldn't be any particular memory effect due to SIREAD locks). I'm also familiar with both Cahill's and Ports/Grittner's published descriptions of SSI, but again, nothing obvious jumps out.

In my experience this sort of behavior indicates a type of bug where fixing it would have a large impact on performance (because the early "damage" is done so quickly that even the very first run doesn't live up to its true potential).

$ cat pgsql.conf
shared_buffers = 8GB
synchronous_commit = off
checkpoint_segments = 64
max_pred_locks_per_transaction = 2000
default_statistics_target = 100
maintenance_work_mem = 2GB
checkpoint_completion_target = 0.9
effective_cache_size = 40GB
work_mem = 1920MB
wal_buffers = 16MB

Thanks,
Ryan



--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance




[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux