Re: Query Performance SQL Server vs. Postgresql

Pavel Stehule <pavel.stehule@xxxxxxxxx> · Thu, 18 Nov 2010 07:14:24 +0100

Hello,

there should be a problem in a statistic, they are out of reality.
Please, try to use a DISTINCT OF operator now - maybe a statistic will
be better. Next -  try to increase a work_mem. Hash join is
untypically slow in your comp.

Regards

Pavel Stehule

2010/11/17 Humair Mohammed <humairm@xxxxxxxxxxx>:
> Yes strange indeed, I did rerun ANALYZE and VACCUM. Took 70 seconds to rerun
> the query. Results from EXPLAIN ANALYZE below:
> "Hash Join Â(cost=16212.30..52586.43 rows=92869 width=17) (actual
> time=43200.223..49502.874 rows=3163 loops=1)"
> " ÂHash Cond: (((pb.id)::text = (pg.id)::text) AND ((pb.question)::text =
> (pg.question)::text))"
> " ÂJoin Filter: ((COALESCE(pb.response, 'MISSING'::character varying))::text
> <> (COALESCE(pg.response, 'MISSING'::character varying))::text)"
> " Â-> ÂSeq Scan on pivotbad pb Â(cost=0.00..2804.96 rows=93496 width=134)
> (actual time=0.009..48.200 rows=93496 loops=1)"
> " Â-> ÂHash Â(cost=7537.12..7537.12 rows=251212 width=134) (actual
> time=42919.453..42919.453 rows=251212 loops=1)"
> " Â Â Â ÂBuckets: 1024 ÂBatches: 64 ÂMemory Usage: 650kB"
> " Â Â Â Â-> ÂSeq Scan on pivotgood pg Â(cost=0.00..7537.12 rows=251212
> width=134) (actual time=0.119..173.019 rows=251212 loops=1)"
> "Total runtime: 49503.450 ms"
>
>> From: pavel.stehule@xxxxxxxxx
>> Date: Wed, 17 Nov 2010 05:47:51 +0100
>> Subject: Re: Query Performance SQL Server vs. Postgresql
>> To: humairm@xxxxxxxxxxx
>> CC: pgsql-performance@xxxxxxxxxxxxxx
>>
>> 2010/11/17 Humair Mohammed <humairm@xxxxxxxxxxx>:
>> >
>> > There are no indexes on the tables either in SQL Server or Postgresql -
>> > I am
>> > comparing apples to apples here.ÂI ran ANALYZE on the postgresql tables,
>> > after that query performance times are still high 42 seconds with
>> > COALESCE
>> > and 35 seconds with IS DISTINCT FROM.
>> > Here is the execution plan from Postgresql for qurey -Âselect pb.id from
>> > pivotbad pb inner join pivotgood pg on pb.id = pg.id and pb.question =
>> > pg.question and coalesce(pb.response,'MISSING') <>
>> > coalesce(pg.response,'MISSING')
>> > Execution Time: 42 seconds
>> > "Hash Join Â(cost=16212.30..48854.24 rows=93477 width=17)"
>> > " ÂHash Cond: (((pb.id)::text = (pg.id)::text) AND ((pb.question)::text
>> > =
>> > (pg.question)::text))"
>> > " ÂJoin Filter: ((COALESCE(pb.response, 'MISSING'::character
>> > varying))::text
>> > <> (COALESCE(pg.response, 'MISSING'::character varying))::text)"
>> > " Â-> ÂSeq Scan on pivotbad pb Â(cost=0.00..2804.96 rows=93496
>> > width=134)"
>> > " Â-> ÂHash Â(cost=7537.12..7537.12 rows=251212 width=134)"
>> > " Â Â Â Â-> ÂSeq Scan on pivotgood pg Â(cost=0.00..7537.12 rows=251212
>> > width=134)"
>>
>> this is little bit strange - did you ANALYZE and VACUUM?
>>
>> please send result of EXPLAIN ANALYZE
>>
>> Pavel
>>
>> >
>> > And here is the execution plan from SQL Server for query -Âselect pb.id
>> > from
>> > pivotbad pb inner join pivotgood pg on pb.id = pg.id and pb.question =
>> > pg.question and isnull(pb.response,'ISNULL')<>
>> > Âisnull(pg.response,'ISNULL')
>> > Execution Time: < 1 second
>> > Cost: 1% Â|--Parallelism(Gather Streams)
>> > Cost: 31% Â Â Â |--Hash Match(Inner Join, HASH:([pb].[ID],
>> > [pb].[Question])=([pg].[ID], [pg].[Question]),
>> > RESIDUAL:([master].[dbo].[pivotbad].[ID] as
>> > [pb].[ID]=[master].[dbo].[pivotgood].[ID] as [pg].[ID] AND
>> > [master].[dbo].[pivotbad].[Question] as
>> > [pb].[Question]=[master].[dbo].[pivotgood].[Question] as [pg].[Question]
>> > AND
>> > [Expr1006]<>[Expr1007]))
>> > ÂÂ ÂCost: 0% Â|--Bitmap(HASH:([pb].[ID], [pb].[Question]),
>> > DEFINE:([Bitmap1008]))
>> > ÂÂ Â Â Â Â ÂCost: 0% Â Â|--Compute
>> > Scalar(DEFINE:([Expr1006]=isnull([master].[dbo].[pivotbad].[Response] as
>> > [pb].[Response],'ISNULL')))
>> > ÂÂ Â Â Â Â ÂCost: Â6% Â |--Parallelism(Repartition Streams, Hash
>> > Partitioning, PARTITION COLUMNS:([pb].[ID], [pb].[Question]))
>> > ÂÂ Â Â Â Â ÂCost: 12% Â|--Table Scan(OBJECT:([master].[dbo].[pivotbad]
>> > AS
>> > [pb]))
>> > ÂÂ Â Â Â Â ÂCost: 0% |--Compute
>> > Scalar(DEFINE:([Expr1007]=isnull([master].[dbo].[pivotgood].[Response]
>> > as
>> > [pg].[Response],'ISNULL')))
>> > ÂÂ Â Â Â Â Â Â ÂCost: 17% |--Parallelism(Repartition Streams, Hash
>> > Partitioning, PARTITION COLUMNS:([pg].[ID], [pg].[Question]))
>> > ÂÂ Â Â Â Â Â Â Â Â ÂCost: 33% |--Table
>> > Scan(OBJECT:([master].[dbo].[pivotgood] AS [pg]),
>> > WHERE:(PROBE([Bitmap1008],[master].[dbo].[pivotgood].[ID] as
>> > [pg].[ID],[master].[dbo].[pivotgood].[Question] as [pg].[Question])))
>> >
>> >
>> >
>> >> From: pavel.stehule@xxxxxxxxx
>> >> Date: Tue, 16 Nov 2010 08:12:03 +0100
>> >> Subject: Re: 
>> >> To: humairm@xxxxxxxxxxx
>> >> CC: pgsql-performance@xxxxxxxxxxxxxx
>> >>
>> >> 2010/11/15 Humair Mohammed <humairm@xxxxxxxxxxx>:
>> >> > I have 2 tables with a 200,000 rows of data 3 character/string
>> >> > columns
>> >> > ID,
>> >> > Question and Response. The query below compares the data between the
>> >> > 2
>> >> > tables based on ID and Question and if the Response does not match
>> >> > between
>> >> > the left table and the right table it identifies the ID's where there
>> >> > is
>> >> > a
>> >> > mismatch. Running the query in SQL Server 2008 using the ISNULL
>> >> > function
>> >> > take a few milliseconds. Running the same query in Postgresql takes
>> >> > over
>> >> > 70
>> >> > seconds. The 2 queries are below:
>> >> > SQL Server 2008 R2 Query
>> >> > select t1.id from table1 t1 inner join table2 t2 on t1.id = t2.id and
>> >> > t1.question = t2.question and isnull(t1.response,'ISNULL') <>
>> >> > isnull(t2.response,'ISNULL')
>> >>
>> >> > Postgres 9.1 Query
>> >> > select t1.id from table1 t1 inner join table2 t2 on t1.id = t2.id and
>> >> > t1.question = t2.question and coalesce(t1.response,'ISNULL') <>
>> >> > coalesce(t2.response,'ISNULL')
>> >> > What gives?
>> >>
>> >> I think, so must problem can be in ugly predicate
>> >> coalesce(t1.response,'ISNULL') <>
>> >> > coalesce(t2.response,'ISNULL')
>> >>
>> >> try use a IS DISTINCT OF operator
>> >>
>> >> ... AND t1.response IS DISTINCT t2.response
>> >>
>> >> Regards
>> >>
>> >> Pavel Stehule
>> >>
>> >> p.s. don't use a coalesce in WHERE clause if it is possible.
>> >
>

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance