Hello I don't know. I checked similar query on similar dataset (size), and I have a times about 3 sec on much more worse notebook. So problem can be in disk IO operation speed - maybe in access to TOASTed value. 2010/11/21 Humair Mohammed <humairm@xxxxxxxxxxx>: > I am running 64-bit Windows 7 Enterprise with 8G RAM and aÂDual Core 2.67 > Ghz Intel CPU. Both the SQL 2008 R2 and Postgresql are installed on the same > machine. The DISTINCT FROM instead of the COALESCE does not help much.ÂI ran > 2 further tests with work_mem modifications (please note memory usage is > quite low 650kb, so I am not sure if the work_mem is a factor): it's has a little bit different meaning. work_mem is just limit, so "memory usage" must not be great than work_mem ever. if then pg increase "butches" number - store data to blocks on disk. Higher work_mem ~ less butches. So ideal is 1 butches. Regards Pavel Stehule > First, I modified the work_mem setting to 1GB (reloaded config) from the > default 1MB and I see a response time of 33 seconds. Results below from > EXPLAIN ANALYZE: > "Hash Join Â(cost=16212.30..52586.43 rows=92869 width=17) (actual > time=26742.343..33274.317 rows=3163 loops=1)" > " ÂHash Cond: (((pb.id)::text = (pg.id)::text) AND ((pb.question)::text = > (pg.question)::text))" > " ÂJoin Filter: ((COALESCE(pb.response, 'MISSING'::character varying))::text > <> (COALESCE(pg.response, 'MISSING'::character varying))::text)" > " Â-> ÂSeq Scan on pivotbad pb Â(cost=0.00..2804.96 rows=93496 width=134) > (actual time=0.055..40.710 rows=93496 loops=1)" > " Â-> ÂHash Â(cost=7537.12..7537.12 rows=251212 width=134) (actual > time=25603.460..25603.460 rows=251212 loops=1)" > " Â Â Â ÂBuckets: 1024 ÂBatches: 64 ÂMemory Usage: 650kB" > " Â Â Â Â-> ÂSeq Scan on pivotgood pg Â(cost=0.00..7537.12 rows=251212 > width=134) (actual time=0.050..120.269 rows=251212 loops=1)" > "Total runtime: 33275.028 ms" > > Second, I modified the work_mem setting to 2GBÂ(reloaded config)Âand I see a > response time of 38 seconds. Results below from EXPLAIN ANALYZE: > "Hash Join Â(cost=16212.30..52586.43 rows=92869 width=17) (actual > time=26574.459..38406.422 rows=3163 loops=1)" > " ÂHash Cond: (((pb.id)::text = (pg.id)::text) AND ((pb.question)::text = > (pg.question)::text))" > " ÂJoin Filter: ((COALESCE(pb.response, 'MISSING'::character varying))::text > <> (COALESCE(pg.response, 'MISSING'::character varying))::text)" > " Â-> ÂSeq Scan on pivotbad pb Â(cost=0.00..2804.96 rows=93496 width=134) > (actual time=0.067..37.938 rows=93496 loops=1)" > " Â-> ÂHash Â(cost=7537.12..7537.12 rows=251212 width=134) (actual > time=26426.127..26426.127 rows=251212 loops=1)" > " Â Â Â ÂBuckets: 1024 ÂBatches: 64 ÂMemory Usage: 650kB" > " Â Â Â Â-> ÂSeq Scan on pivotgood pg Â(cost=0.00..7537.12 rows=251212 > width=134) (actual time=0.038..115.319 rows=251212 loops=1)" > "Total runtime: 38406.927 ms" > > By no means I am trying to compare the 2 products. When I noticed the slow > behavior of COALESCE I tried it on SQL Server. And since they are running on > the same machine my comment regarding apples to apples. It is possible that > this is not an apples to apples comparison other than the fact that it is > running on the same machine. > >> From: pavel.stehule@xxxxxxxxx >> Date: Thu, 18 Nov 2010 07:14:24 +0100 >> Subject: Re: Query Performance SQL Server vs. Postgresql >> To: humairm@xxxxxxxxxxx >> CC: pgsql-performance@xxxxxxxxxxxxxx >> >> Hello, >> >> there should be a problem in a statistic, they are out of reality. >> Please, try to use a DISTINCT OF operator now - maybe a statistic will >> be better. Next - try to increase a work_mem. Hash join is >> untypically slow in your comp. >> >> Regards >> >> Pavel Stehule >> >> 2010/11/17 Humair Mohammed <humairm@xxxxxxxxxxx>: >> > Yes strange indeed, I did rerun ANALYZE and VACCUM. Took 70 seconds to >> > rerun >> > the query. Results from EXPLAIN ANALYZE below: >> > "Hash Join Â(cost=16212.30..52586.43 rows=92869 width=17) (actual >> > time=43200.223..49502.874 rows=3163 loops=1)" >> > " ÂHash Cond: (((pb.id)::text = (pg.id)::text) AND ((pb.question)::text >> > = >> > (pg.question)::text))" >> > " ÂJoin Filter: ((COALESCE(pb.response, 'MISSING'::character >> > varying))::text >> > <> (COALESCE(pg.response, 'MISSING'::character varying))::text)" >> > " Â-> ÂSeq Scan on pivotbad pb Â(cost=0.00..2804.96 rows=93496 >> > width=134) >> > (actual time=0.009..48.200 rows=93496 loops=1)" >> > " Â-> ÂHash Â(cost=7537.12..7537.12 rows=251212 width=134) (actual >> > time=42919.453..42919.453 rows=251212 loops=1)" >> > " Â Â Â ÂBuckets: 1024 ÂBatches: 64 ÂMemory Usage: 650kB" >> > " Â Â Â Â-> ÂSeq Scan on pivotgood pg Â(cost=0.00..7537.12 rows=251212 >> > width=134) (actual time=0.119..173.019 rows=251212 loops=1)" >> > "Total runtime: 49503.450 ms" >> > >> >> From: pavel.stehule@xxxxxxxxx >> >> Date: Wed, 17 Nov 2010 05:47:51 +0100 >> >> Subject: Re: Query Performance SQL Server vs. Postgresql >> >> To: humairm@xxxxxxxxxxx >> >> CC: pgsql-performance@xxxxxxxxxxxxxx >> >> >> >> 2010/11/17 Humair Mohammed <humairm@xxxxxxxxxxx>: >> >> > >> >> > There are no indexes on the tables either in SQL Server or Postgresql >> >> > - >> >> > I am >> >> > comparing apples to apples here.ÂI ran ANALYZE on the postgresql >> >> > tables, >> >> > after that query performance times are still high 42 seconds with >> >> > COALESCE >> >> > and 35 seconds with IS DISTINCT FROM. >> >> > Here is the execution plan from Postgresql for qurey -Âselect pb.id >> >> > from >> >> > pivotbad pb inner join pivotgood pg on pb.id = pg.id and pb.question >> >> > = >> >> > pg.question and coalesce(pb.response,'MISSING') <> >> >> > coalesce(pg.response,'MISSING') >> >> > Execution Time: 42 seconds >> >> > "Hash Join Â(cost=16212.30..48854.24 rows=93477 width=17)" >> >> > " ÂHash Cond: (((pb.id)::text = (pg.id)::text) AND >> >> > ((pb.question)::text >> >> > = >> >> > (pg.question)::text))" >> >> > " ÂJoin Filter: ((COALESCE(pb.response, 'MISSING'::character >> >> > varying))::text >> >> > <> (COALESCE(pg.response, 'MISSING'::character varying))::text)" >> >> > " Â-> ÂSeq Scan on pivotbad pb Â(cost=0.00..2804.96 rows=93496 >> >> > width=134)" >> >> > " Â-> ÂHash Â(cost=7537.12..7537.12 rows=251212 width=134)" >> >> > " Â Â Â Â-> ÂSeq Scan on pivotgood pg Â(cost=0.00..7537.12 >> >> > rows=251212 >> >> > width=134)" >> >> >> >> this is little bit strange - did you ANALYZE and VACUUM? >> >> >> >> please send result of EXPLAIN ANALYZE >> >> >> >> Pavel >> >> >> >> > >> >> > And here is the execution plan from SQL Server for query -Âselect >> >> > pb.id >> >> > from >> >> > pivotbad pb inner join pivotgood pg on pb.id = pg.id and pb.question >> >> > = >> >> > pg.question and isnull(pb.response,'ISNULL')<> >> >> > Âisnull(pg.response,'ISNULL') >> >> > Execution Time: < 1 second >> >> > Cost: 1% Â|--Parallelism(Gather Streams) >> >> > Cost: 31% Â Â Â |--Hash Match(Inner Join, HASH:([pb].[ID], >> >> > [pb].[Question])=([pg].[ID], [pg].[Question]), >> >> > RESIDUAL:([master].[dbo].[pivotbad].[ID] as >> >> > [pb].[ID]=[master].[dbo].[pivotgood].[ID] as [pg].[ID] AND >> >> > [master].[dbo].[pivotbad].[Question] as >> >> > [pb].[Question]=[master].[dbo].[pivotgood].[Question] as >> >> > [pg].[Question] >> >> > AND >> >> > [Expr1006]<>[Expr1007])) >> >> > ÂÂ ÂCost: 0% Â|--Bitmap(HASH:([pb].[ID], [pb].[Question]), >> >> > DEFINE:([Bitmap1008])) >> >> > ÂÂ Â Â Â Â ÂCost: 0% Â Â|--Compute >> >> > Scalar(DEFINE:([Expr1006]=isnull([master].[dbo].[pivotbad].[Response] >> >> > as >> >> > [pb].[Response],'ISNULL'))) >> >> > ÂÂ Â Â Â Â ÂCost: Â6% Â |--Parallelism(Repartition Streams, Hash >> >> > Partitioning, PARTITION COLUMNS:([pb].[ID], [pb].[Question])) >> >> > ÂÂ Â Â Â Â ÂCost: 12% Â|--Table >> >> > Scan(OBJECT:([master].[dbo].[pivotbad] >> >> > AS >> >> > [pb])) >> >> > ÂÂ Â Â Â Â ÂCost: 0% |--Compute >> >> > >> >> > Scalar(DEFINE:([Expr1007]=isnull([master].[dbo].[pivotgood].[Response] >> >> > as >> >> > [pg].[Response],'ISNULL'))) >> >> > ÂÂ Â Â Â Â Â Â ÂCost: 17% |--Parallelism(Repartition Streams, Hash >> >> > Partitioning, PARTITION COLUMNS:([pg].[ID], [pg].[Question])) >> >> > ÂÂ Â Â Â Â Â Â Â Â ÂCost: 33% |--Table >> >> > Scan(OBJECT:([master].[dbo].[pivotgood] AS [pg]), >> >> > WHERE:(PROBE([Bitmap1008],[master].[dbo].[pivotgood].[ID] as >> >> > [pg].[ID],[master].[dbo].[pivotgood].[Question] as [pg].[Question]))) >> >> > >> >> > >> >> > >> >> >> From: pavel.stehule@xxxxxxxxx >> >> >> Date: Tue, 16 Nov 2010 08:12:03 +0100 >> >> >> Subject: Re: >> >> >> To: humairm@xxxxxxxxxxx >> >> >> CC: pgsql-performance@xxxxxxxxxxxxxx >> >> >> >> >> >> 2010/11/15 Humair Mohammed <humairm@xxxxxxxxxxx>: >> >> >> > I have 2 tables with a 200,000 rows of data 3 character/string >> >> >> > columns >> >> >> > ID, >> >> >> > Question and Response. The query below compares the data between >> >> >> > the >> >> >> > 2 >> >> >> > tables based on ID and Question and if the Response does not match >> >> >> > between >> >> >> > the left table and the right table it identifies the ID's where >> >> >> > there >> >> >> > is >> >> >> > a >> >> >> > mismatch. Running the query in SQL Server 2008 using the ISNULL >> >> >> > function >> >> >> > take a few milliseconds. Running the same query in Postgresql >> >> >> > takes >> >> >> > over >> >> >> > 70 >> >> >> > seconds. The 2 queries are below: >> >> >> > SQL Server 2008 R2 Query >> >> >> > select t1.id from table1 t1 inner join table2 t2 on t1.id = t2.id >> >> >> > and >> >> >> > t1.question = t2.question and isnull(t1.response,'ISNULL') <> >> >> >> > isnull(t2.response,'ISNULL') >> >> >> >> >> >> > Postgres 9.1 Query >> >> >> > select t1.id from table1 t1 inner join table2 t2 on t1.id = t2.id >> >> >> > and >> >> >> > t1.question = t2.question and coalesce(t1.response,'ISNULL') <> >> >> >> > coalesce(t2.response,'ISNULL') >> >> >> > What gives? >> >> >> >> >> >> I think, so must problem can be in ugly predicate >> >> >> coalesce(t1.response,'ISNULL') <> >> >> >> > coalesce(t2.response,'ISNULL') >> >> >> >> >> >> try use a IS DISTINCT OF operator >> >> >> >> >> >> ... AND t1.response IS DISTINCT t2.response >> >> >> >> >> >> Regards >> >> >> >> >> >> Pavel Stehule >> >> >> >> >> >> p.s. don't use a coalesce in WHERE clause if it is possible. >> >> > >> > > -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance