Here are some extra information:
- When we increased NUM_BUFFER_PARTITIONS to 1024, this problem is disappeared for 8 core machines and come back with 16 core machines on Amazon EC2. Would it be related with PostgreSQL locking mechanism?
- I tried this test with 4 core machines including my personel computer and some other instances on Amazon EC2, I didn't see this problem with 4 core machines. I started to see this problem in PostgreSQL when core count is 8 or more.
- Here are the results of "vmstat 1" while running 8 parallel select count(*). Normally I would expect zero idle time.
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st
0 0 0 29838640 94000 38954740 0 0 0 0 22 21 0 0 100 0 0
7 2 0 29788416 94000 38954740 0 0 0 0 53922 108490 14 24 60 1 1
5 0 0 29747248 94000 38954740 0 0 0 0 68008 164571 22 48 27 2 1
8 0 0 29725796 94000 38954740 0 0 0 0 43587 150574 28 54 16 1 1
0 0 0 29838328 94000 38954740 0 0 0 0 15584 100459 26 55 18 1 0
0 0 0 29838328 94000 38954740 0 0 0 0 42 15 0 0 100 0 0
- When I run 8 parallel wc command or other scripts, they scale out as expected and they utilize all cpu. This leads me to think that problem is related with PostgreSQL instead of OS.