Re: INDEX Performance Issue

Mark Davidson <mark@xxxxxxxxxxx> · Mon, 8 Apr 2013 21:18:52 +0100

Wow my results are absolutely appalling compared to both of those which is really interesting. Are you running postgres 9.2.4 on both instances? Any specific configuration changes? 
Thinking there must be something up with my setup to be getting such a low tps compared with you. 

On 8 April 2013 21:02, Vasilis Ventirozos <v.ventirozos@xxxxxxxxx> wrote:

-c 10 means 10 clients so that should take advantage of all your cores (see bellow)

%Cpu0  : 39.3 us, 21.1 sy,  0.0 ni, 38.7 id,  0.9 wa,  0.0 hi,  0.0 si,  0.0 st

%Cpu1  : 38.0 us, 25.0 sy,  0.0 ni, 26.0 id,  4.2 wa,  0.0 hi,  6.8 si,  0.0 st
%Cpu2  : 39.3 us, 20.4 sy,  0.0 ni, 39.0 id,  1.3 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu3  : 40.0 us, 18.7 sy,  0.0 ni, 40.0 id,  1.3 wa,  0.0 hi,  0.0 si,  0.0 st

%Cpu4  : 13.9 us,  7.1 sy,  0.0 ni, 79.1 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu5  : 13.1 us,  8.4 sy,  0.0 ni, 78.5 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu6  : 14.8 us,  6.4 sy,  0.0 ni, 78.8 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st

%Cpu7  : 15.7 us,  6.7 sy,  0.0 ni, 77.7 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st

 i am pasting you the results of the same test on a i7-2600 16gb with a sata3 SSD and the results from a VM with 2 cores and a normal 7200 rpm hdd

-- DESKTOP
vasilis@Disorder ~ $ pgbench -c 10 -t 10000 bench
starting vacuum...end.
transaction type: TPC-B (sort of)
scaling factor: 1
query mode: simple
number of clients: 10

number of threads: 1
number of transactions per client: 10000
number of transactions actually processed: 100000/100000
tps = 1713.338111 (including connections establishing)
tps = 1713.948478 (excluding connections establishing)

-- VM

postgres@pglab1:~/postgresql-9.2.4/contrib/pgbench$ ./pgbench -c 10 -t 10000 bench
starting vacuum...end.
transaction type: TPC-B (sort of)
scaling factor: 1

query mode: simple
number of clients: 10
number of threads: 1
number of transactions per client: 10000
number of transactions actually processed: 100000/100000
tps = 1118.976496 (including connections establishing)

tps = 1119.180126 (excluding connections establishing)

i am assuming that you didn't populate your pgbench db with the default values , if you tell me how you did i will be happy to re run the test and see the differences.

On Mon, Apr 8, 2013 at 10:31 PM, Mark Davidson <mark@xxxxxxxxxxx> wrote:

Thanks for your response Vasillis. I've run pgbench on both machines `./pgbench -c 10 -t 10000 pgbench` getting 99.800650 tps on my local machine and 23.825332 tps on the server so quite a significant difference. 

Could this purely be down to the CPU clock speed or is it likely something else causing the issue?
I have run ANALYZE on both databases and tried the queries a number of times on each to make sure the results are consistent, this is the case. 

On 8 April 2013 18:19, Vasilis Ventirozos <v.ventirozos@xxxxxxxxx> wrote:

Hello Mark,
PostgreSQL currently doesn't support parallel query so a faster cpu even if it has less cores would be faster for a single query, about benchmarking you can try pgbench that you will find in the contrib,

the execution plan may be different because of different statistics, have you analyzed both databases when you compared the execution plans ?

Vasilis Ventirozos

Been trying to progress with this today. Decided to setup the database on my local machine to try a few things and I'm getting much more sensible results and a totally different query plan http://explain.depesz.com/s/KGd in this case the query took about a minute but does sometimes take around 80 seconds. 

The config is exactly the same between the two database. The databases them selves are identical with all indexes the same on the tables. 

The server has an 2 x Intel Xeon E5420 running at 2.5Ghz each, 16GB RAM and the database is just on a SATA HDD which is a Western Digital WD5000AAKS.

My desktop has a single i5-3570K running at 3.4Ghz, 16GB RAM and the database is running on a SATA HDD which is a Western Digital WD1002FAEX-0

Could anyone offer any reasoning as to why the plan would be so different across the two machines? I would have thought that the server would perform a lot better since it has more cores or is postgres more affected by the CPU speed? Could anyone suggest a way to bench mark the machines for their postgres performance?

Thanks again for everyones input,

Mark

On 7 April 2013 23:22, Mark Davidson <mark@xxxxxxxxxxx> wrote:

Takes a little longer with the INNER join unfortunately. Takes about ~3.5 minutes, here is the query plan http://explain.depesz.com/s/EgBl. 

With the JOIN there might not be a match if the data does not fall within one of the areas that is selected in the IN query. 

So if we have data id (10) that might fall in areas ( 1, 5, 8, 167 ) but the user might be querying areas ( 200 ... 500 ) so no match in area would be found just to be absolutely clear. 

Is it worth considering adding additional statistics on any of the columns? And / Or additional INDEXES or different types INDEX? Would it be worth restructuring the query starting with areas and working to join data to that? 

On 7 April 2013 16:15, Kevin Grittner <kgrittn@xxxxxxxxx> wrote:

Greg Williamson <gwilliamson39@xxxxxxxxx> wrote:

>> Thanks for your response. I tried doing what you suggested so

>> that table now has a primary key of

>> ' CONSTRAINT data_area_pkey PRIMARY KEY(area_id , data_id ); '

>> and I've added the INDEX of

>> 'CREATE INDEX data_area_data_id_index ON data_area USING btree (data_id );'

Yeah, that is what I was suggesting.

>> unfortunately it hasn't resulted in an improvement of the query

>> performance.

> Did you run analyze on the table after creating the index ?

That probably isn't necessary.  Statistics are normally on relations

and columns; there are only certain special cases where an ANALYZE

is needed after an index build, like if the index is on an

_expression_ rather than a list of columns.

Mark, what happens if you change that left join to a normal (inner)

join?  Since you're doing an inner join to data_area and that has a

foreign key to area, there should always be a match anyway, right?

The optimizer doesn't recognize that, so it can't start from the

area and just match to the appropriate points.

--

Kevin Grittner

EnterpriseDB: http://www.enterprisedb.com

The Enterprise PostgreSQL Company