Hi all,
I've collected some interesting results during my experiments which I couldn't figure out the reason behind them and need your assistance.
I'm running PostgreSQL 9.0 on a quad-core machine having two level on-chip cache hierarchy. PostgreSQL has a large and warmed-up
buffer
cache thus, no disk I/O is observed during experiments (i.e. for each query buffer cache hit rate is 100%). I'm pinning each query/process to an individual core. Queries are simple read-only queries (only selects). Nested loop (without materialize) is used for the join operator.
When I pin a single query to an individual core, its execution time is observed as 111 seconds. This result is my base case. Then, I fire two instances of the same query concurrently and pin them to two different cores separately. However, each execution time becomes
132 seconds in this case. In a similar trend, execution times are increasing for three instances (164 seconds)
and four instances (201 seconds) cases too. What I was expecting is a linear improvement in throughput (at least). I tried several different queries and got the same trend at each time.
I wonder why execution times of individual queries are increasing when I increase the number of their instances.
Btw, I don't think on-chip cache hit/miss rates make a
difference since L2 cache misses are decreasing as expected. I'm not an expert in PostgreSQL internals. Maybe there is a lock-contention (spinlocks?) occurring even if the queries are read-only. Anyways, all ideas are welcome.
Thanks in advance,
Regards,
Umut