Greg Smith <greg <at> 2ndquadrant.com> writes: > Looks to me like you're running into a general memory bandwidth issue > here, possibly one that's made a bit worse by how pgbench works. It's a > somewhat funky workload Linux systems aren't always happy with, although > one of your tests had the right configuration to sidestep the worst of > the problems there. I don't see any evidence that pgbench itself is a > likely suspect for the issue, but it does shuffle a lot of things around > in memory relative to transaction time when running this small > select-only test, and clients can get stuck waiting for it when that > happens. > > To put your results in perspective, I would expect to get around 25K TPS > running the pgbench setup/test you're doing on a recent 4-core/single > processor system, and around 50K TPS is normal for an 8-core server > doing this type of test. And those numbers are extremely sensitive to > the speed of the underlying RAM even with the CPU staying the same. > > I would characterize your results as "getting about 1/2 of the > CPU+memory performance of an install on a dedicated 8-core system". > That's not horrible, as long as you have reasonable expectations here, > which is really the case for any virtualized install I think. I'd > actually like to launch a more thorough investigation into this > particular area, exactly how the PostgreSQL bottlenecks shift around on > EC2 compared to similar dedicated hardware, if I found a sponsor for it > one day. A bit too much work to do it right just for fun. I can understand that I will not get as much performance out of a EC2 instance as a dedicated server, but I don't understand why top(1) is showing 50% CPU utilization. If it were a memory speed problem wouldn't top(1) report 100% CPU utilization? Does the kernel really do a context shift when waiting for response from RAM? That would surprise me, because to do a context shift it might need to read from RAM, which would then also block. I still worry it is a lock contention or scheduling problem, but I am not sure how to diagnose it. I've seen some references to using dtrace to analyze PostgreSQL locks, but it looks like it might take a lot of ramp up time for me to learn how to use dtrace. Note that I can peg the CPU by running 8 infinite loops inside or outside the database. I have only seen the utilization problem when running queries (with pgbench and my application) against PostgreSQL. In any case, assuming this is a EC2 memory speed thing, it is going to be difficult to diagnose application bottlenecks when I cannot rely on top(1) reporting meaningful CPU stats. Thank you for your help. -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general