Re: Speeding up query, Joining 55mil and 43mil records.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello again,

thanks for all the quick replies.

It seems i wasn't entirely correct on my previous post, i've mixed up some times/numbers.

Below the correct numbers

MSSQL:      SELECT COUNT(*) from JOIN (without insert)   17 minutes
PostgreSQL: SELECT COUNT(*) from JOIN (without insert)   33 minutes
PostgreSQL: complete query                               55 minutes

The part i'm really troubled with is the difference in performance for the select part. Which takes twice as long on PostgreSQL even though it has a better server then MSSQL.

Changed i've made to postgressql.conf

work_mem = 524288 (1GB, results in out of memory error)
checkpoints_segments = 256
checkpoints_timeout = 3600
checkpoints_warning = 0


I've ran the complete 'explain analyse query' twice. First with pgsql_tmp on the same disk, then again with pgsql_tmp on a seperate disk.

**** (PostgreSQL) (*pgsql_tmp on same disk*):

Hash Join (cost=1328360.12..6167462.76 rows=7197568 width=118) (actual time=327982.425..1903423.769 rows=7551616 loops=1)
 Hash Cond: (("outer".id)::text = ("inner".id)::text)
-> Seq Scan on src_faktuur_verrsec t0 (cost=0.00..2773789.90 rows=40902852 width=52) (actual time=8.935..613455.204 rows=37368390 loops=1) Filter: ((substr((code)::text, 1, 2) <> '14'::text) AND (substr((code)::text, 1, 2) <> '15'::text) AND (substr((code)::text, 1, 2) <> '16'::text) AND (substr((code)::text, 1, 2) <> '17'::text) AND ((substr((correctie)::text, 4, 1) <> '1'::text) OR (correctie IS NULL))) -> Hash (cost=1188102.97..1188102.97 rows=8942863 width=80) (actual time=327819.698..327819.698 rows=8761024 loops=1) -> Bitmap Heap Scan on src_faktuur_verricht t1 (cost=62392.02..1188102.97 rows=8942863 width=80) (actual time=75911.336..295510.647 rows=8761024 loops=1) Recheck Cond: (date_part('year'::text, datum) > 2004::double precision) -> Bitmap Index Scan on src_faktuur_verricht_idx1 (cost=0.00..62392.02 rows=8942863 width=0) (actual time=75082.080..75082.080 rows=8761024 loops=1) Index Cond: (date_part('year'::text, datum) > 2004::double precision)
Total runtime: 3355696.015 ms


**** (PostgreSQL) (*pgsql_tmp on seperate disk*)

Hash Join (cost=1328360.12..6167462.76 rows=7197568 width=118) (actual time=172797.736..919869.708 rows=7551616 loops=1)
 Hash Cond: (("outer".id)::text = ("inner".id)::text)
-> Seq Scan on src_faktuur_verrsec t0 (cost=0.00..2773789.90 rows=40902852 width=52) (actual time=0.015..362154.822 rows=37368390 loops=1) Filter: ((substr((code)::text, 1, 2) <> '14'::text) AND (substr((code)::text, 1, 2) <> '15'::text) AND (substr((code)::text, 1, 2) <> '16'::text) AND (substr((code)::text, 1, 2) <> '17'::text) AND ((substr((correctie)::text, 4, 1) <> '1'::text) OR (correctie IS NULL))) -> Hash (cost=1188102.97..1188102.97 rows=8942863 width=80) (actual time=172759.255..172759.255 rows=8761024 loops=1) -> Bitmap Heap Scan on src_faktuur_verricht t1 (cost=62392.02..1188102.97 rows=8942863 width=80) (actual time=4244.840..142144.606 rows=8761024 loops=1) Recheck Cond: (date_part('year'::text, datum) > 2004::double precision) -> Bitmap Index Scan on src_faktuur_verricht_idx1 (cost=0.00..62392.02 rows=8942863 width=0) (actual time=3431.361..3431.361 rows=8761024 loops=1) Index Cond: (date_part('year'::text, datum) > 2004::double precision)
Total runtime: 2608316.714 ms

A lot of difference in performance. 55 minutes to 42 minutes.


I've ran the 'select count(*) from JOIN' to see the difference on that part.

**** (PostgreSQL) Explain analyse from SELECT COUNT(*) from the JOIN. (*pgsql_tmp on seperate disk*)

Aggregate (cost=5632244.93..5632244.94 rows=1 width=0) (actual time=631993.425..631993.427 rows=1 loops=1) -> Hash Join (cost=1258493.12..5614251.00 rows=7197568 width=0) (actual time=237999.277..620018.706 rows=7551616 loops=1)
       Hash Cond: (("outer".id)::text = ("inner".id)::text)
-> Seq Scan on src_faktuur_verrsec t0 (cost=0.00..2773789.90 rows=40902852 width=14) (actual time=23.449..200532.422 rows=37368390 loops=1) Filter: ((substr((code)::text, 1, 2) <> '14'::text) AND (substr((code)::text, 1, 2) <> '15'::text) AND (substr((code)::text, 1, 2) <> '16'::text) AND (substr((code)::text, 1, 2) <> '17'::text) AND ((substr((correctie)::text, 4, 1) <> '1'::tex (..) -> Hash (cost=1188102.97..1188102.97 rows=8942863 width=14) (actual time=237939.262..237939.262 rows=8761024 loops=1) -> Bitmap Heap Scan on src_faktuur_verricht t1 (cost=62392.02..1188102.97 rows=8942863 width=14) (actual time=74713.092..216206.478 rows=8761024 loops=1) Recheck Cond: (date_part('year'::text, datum) > 2004::double precision) -> Bitmap Index Scan on src_faktuur_verricht_idx1 (cost=0.00..62392.02 rows=8942863 width=0) (actual time=73892.153..73892.153 rows=8761024 loops=1) Index Cond: (date_part('year'::text, datum) > 2004::double precision)
Total runtime: 631994.172 ms

A lot of improvement also in the select count: 33 minutes vs 10 minutes.


To us, the speeds are good. Very happy with the performance increase on that select with join, since 90% of the queries are SELECT based.

The query results in 7551616 records, so that's about 4500 inserts per second. I'm not sure if that is fast or not. Any further tips would be welcome.

Thanks everyone.
Nicky


[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux