I have a multi-threaded app. It uses ~22 threads to query Postgres.
Postgres won't use more than 1 CPU core. The 22-threaded app only has 3% CPU utilization because it's mostly waiting on Postgres.
Here's the details:
The app has a "main" thread that reads table A's 11,000,000 rows, one at a time. The main thread spawns a new thread for each row in table A's data. This new thread:
- Opens a connection to the DB.
- Does some calculations on the data, including 1 to 102 SELECTs on table B.
- With an INSERT query, writes a new row to table C.
- Closes the connection.
- Thread dies. Its data is garbage collected eventually.
Physical/software details:
- Core i7 processor--4 physical cores, but OS sees 8 cores via hyper-threading
- 7200 RPM 500 GB HDD
- About 1/2 total RAM is free during app execution
- Windows 7 x64
- Postgres 9.0.4 32-bit (32-bit required for PostGIS)
- App is C# w/ .NET 4.0. PLINQ dispatches threads. Npgsql is Postgres connection tool.
At first, the app pounds all 8 cores. But it quickly tapers off, and only 1 core that's busy. The other 7 cores are barely doing a thing.
Postgres has 9 open processes. 1 process was slamming that 1 busy core. The other 8 Postgres processes were alive but idle.
Each thread creates its own connection. It's not concurrently shared with the main thread or any other threads. I haven't disabled connection pooling; when a thread closes a connection, it's technically releasing it into a pool for later threads to use.
Disk utilization is low. The HDD light is off much more than it is on, and a review of total HDD activity put it between 0% and 10% of total capacity. The HDD busy indicator LED would regularly flicker every 0.1 to 0.3 seconds.
The app runs 2 different queries on table B. The 1st query is run once, the 2nd query can be run up to 101 times. Table C has redundant indexes: every column referenced in the SQL WHERE clauses for both queries are indexed separately and jointly. E.g., if query X references columns Y and Z, there are 3 indexes:
- An index for Y
- An index for Z
- An index for Y and Z
Table C is simple. It has four columns: two integers, a varchar(18), and a boolean. It has no indexes. A primary key on the varchar(18) column is its only constraint.
A generalized version of my INSERT command for table C is:
INSERT INTO raw.C VALUES (:L, :M, :N, :P)
I am using parameters to fill in the 4 values.
I have verified table C manually, and correct data is being stored in it.
Several Google searches suggest Postgres should use multiple cores automatically. I've consulted with Npgsql's developer, and he didn't see how Npgsql itself could force Postgres to one core. (See http://pgfoundry.org/pipermail/npgsql-devel/2011-May/001123.html.)
What can I do to improve this? Could I be inadvertently limiting Postgres to one core?
Aren Cambre