Search Postgresql Archives

Huge performance penalty with parallel queries in Windows x64 v. Linux x64

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



When developing a solution for a new customer request I created a new query over the production data.

Despite the relatively low row counts of the involved tables (all < 100k) I noticed quite a long execution time of about 85 ms to 100 ms.

The explain anaylze plan showed a parallel execution plan with 2 parallels.

The data structure and index structure was not quite optimal for this kind of query (which does not matter in this case).

The comparison of the explain analyze plans on win-x64 and Linux x64 showed about 3 times longer execution on windows.

For comparison I reinstalled the production data on two test databases on different virtual machines on the same hardware (the very same machine with Hyper-V virtualization).

The steps were only (on a mostly complete idle machine):
1. create test database
2. pg_restore of the production data from same dump file
3. analyze on the database
4. run the query multiple times (about 5 times) and took the fastest explain analyze.

On fedora 34 64 bit, PG 13.2 unmodified self compiled the query took about 33 ms.
On Windows Server 2019 64 bit, PG 13.2 from EDB download packages the query took about 85 ms.

                                                 version
----------------------------------------------------------------------------------------------------------
 PostgreSQL 13.2 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 11.1.1 20210428 (Red Hat 11.1.1-1), 64-bit

                          version
------------------------------------------------------------
 PostgreSQL 13.2, compiled by Visual C++ build 1914, 64-bit

The corresponding explain plans are available at explain.depesz.com

-- fedora
https://explain.depesz.com/s/Mq3P

-- windows
https://explain.depesz.com/s/VLtZ

The main difference is the time shown for the Gather Merge step (65 ms vs. 7 ms)

The explain plans and the non-standard GUCs are included in the attachments, the configuration for the databases seems quite comparable.

Unfortunately I cannot disclose the query and the table data.

My experience with parallel queries is not very wide, but this massive execution time difference of the exact same query on the exact same data on the exact same hardware with the same, unmodified last stable Postgres version is very astonishing.

BTW I generally observed slower execution under Windows, so production has moved now to Linux.

There seem no relevant GUC differences concerning query execution, so the performance penalty of 300% to 900% (one step only) is not easily explainable.

The databases remain on the system to repeat the queries on request in the queue of further investigation.


Thanks for looking.

Hans Buschmann

                                                                                                     QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Append  (cost=8284.93..8437.72 rows=2022 width=198) (actual time=31.506..32.637 rows=34 loops=1)
   CTE qsum
     ->  Sort  (cost=8280.38..8284.93 rows=1822 width=180) (actual time=31.502..32.577 rows=22 loops=1)
           Sort Key: or_followup.of_season, orders.or_clis_sub_code
           Sort Method: quicksort  Memory: 28kB
           ->  Hash Left Join  (cost=7864.72..8181.70 rows=1822 width=180) (actual time=31.274..32.566 rows=22 loops=1)
                 Hash Cond: (or_followup.of_season = seasons.id_sea)
                 ->  Hash Left Join  (cost=7862.95..8165.94 rows=1822 width=106) (actual time=31.252..32.523 rows=22 loops=1)
                       Hash Cond: (orders.or_clis_sub_code = clients_sub.clis_sub_code)
                       ->  Finalize GroupAggregate  (cost=7818.93..8098.89 rows=1822 width=94) (actual time=30.987..32.251 rows=22 loops=1)
                             Group Key: or_followup.of_season, orders.or_clis_sub_code
                             ->  Gather Merge  (cost=7818.93..8028.30 rows=1518 width=94) (actual time=30.978..32.207 rows=30 loops=1)
                                   Workers Planned: 2
                                   Workers Launched: 2
                                   ->  Partial GroupAggregate  (cost=6818.91..6853.06 rows=759 width=94) (actual time=24.688..25.201 rows=10 loops=3)
                                         Group Key: or_followup.of_season, orders.or_clis_sub_code
                                         ->  Sort  (cost=6818.91..6820.80 rows=759 width=20) (actual time=24.601..24.652 rows=1334 loops=3)
                                               Sort Key: or_followup.of_season, orders.or_clis_sub_code
                                               Sort Method: quicksort  Memory: 53kB
                                               Worker 0:  Sort Method: quicksort  Memory: 179kB
                                               Worker 1:  Sort Method: quicksort  Memory: 203kB
                                               ->  Parallel Hash Join  (cost=4325.21..6782.59 rows=759 width=20) (actual time=19.738..24.311 rows=1334 loops=3)
                                                     Hash Cond: (or_followup.of_id_or = orders.id_or)
                                                     ->  Parallel Append  (cost=0.00..2411.63 rows=11358 width=16) (actual time=0.111..3.899 rows=9074 loops=3)
                                                           ->  Parallel Bitmap Heap Scan on or_followup_archiv or_followup_2  (cost=7.80..1867.52 rows=10947 width=16) (actual time=0.108..2.485 rows=6191 loops=3)
                                                                 Recheck Cond: ((of_season >= 22) AND (of_season <= 33))
                                                                 Rows Removed by Index Recheck: 1078
                                                                 Heap Blocks: lossy=60
                                                                 ->  Bitmap Index Scan on brin_or_followup_archiv_season  (cost=0.00..3.14 rows=19462 width=0) (actual time=0.153..0.153 rows=6400 loops=1)
                                                                       Index Cond: ((of_season >= 22) AND (of_season <= 33))
                                                           ->  Parallel Seq Scan on or_followup or_followup_1  (cost=0.00..487.32 rows=5088 width=16) (actual time=0.017..3.097 rows=8650 loops=1)
                                                                 Filter: ((of_season >= 22) AND (of_season <= 33))
                                                     ->  Parallel Hash  (cost=4299.83..4299.83 rows=2030 width=12) (actual time=17.321..17.325rows=1434 loops=3)
                                                           Buckets: 8192  Batches: 1  Memory Usage: 288kB
                                                           ->  Parallel Hash Join  (cost=1896.60..4299.83 rows=2030 width=12) (actual time=10.133..16.658 rows=1434 loops=3)
                                                                 Hash Cond: (orders.or_id_pr = projects.id_pr)
                                                                 ->  Parallel Append  (cost=0.00..2280.84 rows=30382 width=16) (actual time=0.010..8.750 rows=24306 loops=3)
                                                                       ->  Parallel Seq Scan on orders_archiv orders_2  (cost=0.00..1826.94 rows=37594 width=16) (actual time=0.008..4.511 rows=21303 loops=3)
                                                                       ->  Parallel Seq Scan on orders orders_1  (cost=0.00..301.99 rows=5299 width=16) (actual time=0.007..1.519 rows=9008 loops=1)
                                                                 ->  Parallel Hash  (cost=1891.01..1891.01 rows=447 width=4) (actual time=2.818..2.820 rows=349 loops=3)
                                                                       Buckets: 2048  Batches: 1  Memory Usage: 112kB
                                                                       ->  Parallel Append  (cost=0.00..1891.01 rows=447 width=4) (actual time=0.211..3.386 rows=524 loops=2)
                                                                             ->  Parallel Seq Scan on projects_archiv projects_2  (cost=0.00..1725.90 rows=426 width=4) (actual time=1.975..3.086 rows=349 loops=2)
                                                                                   Filter: (pr_cli_code = 558)
                                                                                   Rows Removed by Filter: 39152
                                                                             ->  Parallel Seq Scan on projects projects_1  (cost=0.00..162.88 rows=205 width=4) (actual time=0.005..0.545 rows=349 loops=1)
                                                                                   Filter: (pr_cli_code = 558)
                                                                                   Rows Removed by Filter: 6434
                       ->  Hash  (cost=32.90..32.90 rows=890 width=16) (actual time=0.257..0.258 rows=890 loops=1)
                             Buckets: 1024  Batches: 1  Memory Usage: 52kB
                             ->  Seq Scan on clients_sub  (cost=0.00..32.90 rows=890 width=16) (actual time=0.008..0.162 rows=890 loops=1)
                 ->  Hash  (cost=1.34..1.34 rows=34 width=12) (actual time=0.012..0.013 rows=34 loops=1)
                       Buckets: 1024  Batches: 1  Memory Usage: 10kB
                       ->  Seq Scan on seasons  (cost=0.00..1.34 rows=34 width=12) (actual time=0.005..0.008 rows=34 loops=1)
   ->  Subquery Scan on "*SELECT* 1"  (cost=0.00..63.77 rows=1822 width=284) (actual time=31.505..31.514 rows=22 loops=1)
         ->  CTE Scan on qsum  (cost=0.00..36.44 rows=1822 width=236) (actual time=31.504..31.509 rows=22 loops=1)
   ->  Subquery Scan on "*SELECT* 2"  (cost=76.41..78.91 rows=200 width=198) (actual time=0.044..0.046 rows=12 loops=1)
         ->  Sort  (cost=76.41..76.91 rows=200 width=200) (actual time=0.043..0.044 rows=12 loops=1)
               Sort Key: qsum_1.of_season
               Sort Method: quicksort  Memory: 26kB
               ->  HashAggregate  (cost=63.77..68.77 rows=200 width=200) (actual time=0.022..0.036 rows=12 loops=1)
                     Group Key: qsum_1.of_season, qsum_1.sea_name
                     Batches: 1  Memory Usage: 48kB
                     ->  CTE Scan on qsum qsum_1  (cost=0.00..36.44 rows=1822 width=120) (actual time=0.000..0.002 rows=22 loops=1)
 Planning Time: 0.820 ms
 Execution Time: 33.183 ms
(66 rows)
                                                                                                     QUERY PLAN                                                       
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Append  (cost=8270.74..8419.86 rows=1973 width=198) (actual time=80.021..81.287 rows=34 loops=1)
   CTE qsum
     ->  Sort  (cost=8266.30..8270.74 rows=1773 width=180) (actual time=80.017..81.211 rows=22 loops=1)
           Sort Key: or_followup.of_season, orders.or_clis_sub_code
           Sort Method: quicksort  Memory: 28kB
           ->  Hash Left Join  (cost=7862.04..8170.63 rows=1773 width=180) (actual time=78.013..81.200 rows=22 loops=1)
                 Hash Cond: (or_followup.of_season = seasons.id_sea)
                 ->  Hash Left Join  (cost=7860.28..8155.25 rows=1773 width=106) (actual time=77.988..81.147 rows=22 loops=1)
                       Hash Cond: (orders.or_clis_sub_code = clients_sub.clis_sub_code)
                       ->  Finalize GroupAggregate  (cost=7816.25..8088.82 rows=1773 width=94) (actual time=77.712..80.862 rows=22 loops=1)
                             Group Key: or_followup.of_season, orders.or_clis_sub_code
                             ->  Gather Merge  (cost=7816.25..8020.10 rows=1478 width=94) (actual time=77.615..80.809 rows=22 loops=1)
                                   Workers Planned: 2
                                   Workers Launched: 2
                                   ->  Partial GroupAggregate  (cost=6816.23..6849.48 rows=739 width=94) (actual time=14.315..14.985 rows=7 loops=3)
                                         Group Key: or_followup.of_season, orders.or_clis_sub_code
                                         ->  Sort  (cost=6816.23..6818.07 rows=739 width=20) (actual time=14.233..14.291 rows=1334 loops=3)
                                               Sort Key: or_followup.of_season, orders.or_clis_sub_code
                                               Sort Method: quicksort  Memory: 409kB
                                               Worker 0:  Sort Method: quicksort  Memory: 25kB
                                               Worker 1:  Sort Method: quicksort  Memory: 25kB
                                               ->  Parallel Hash Join  (cost=4324.32..6781.02 rows=739 width=20) (actual time=8.666..14.013 rows=1334 loops=3)
                                                     Hash Cond: (or_followup.of_id_or = orders.id_or)
                                                     ->  Parallel Append  (cost=0.00..2411.26 rows=11295 width=16) (actual time=0.005..12.457 rows=27223 loops=1)
                                                           ->  Parallel Bitmap Heap Scan on or_followup_archiv or_followup_2  (cost=7.76..1867.46 rows=10858 width=16) (actual time=0.222..7.858 rows=18573 loops=1)
                                                                 Recheck Cond: ((of_season >= 22) AND (of_season <= 33))
                                                                 Rows Removed by Index Recheck: 3234
                                                                 Heap Blocks: lossy=632
                                                                 ->  Bitmap Index Scan on brin_or_followup_archiv_season  (cost=0.00..3.14 rows=19460 width=0) (actual time=0.055..0.055 rows=6400 loops=1)
                                                                       Index Cond: ((of_season >= 22) AND (of_season <= 33))
                                                           ->  Parallel Seq Scan on or_followup or_followup_1  (cost=0.00..487.32 rows=5088 width=16) (actual time=0.004..3.503 rows=8650 loops=1)
                                                                 Filter: ((of_season >= 22) AND (of_season <= 33))
                                                     ->  Parallel Hash  (cost=4299.48..4299.48 rows=1987 width=12) (actual time=8.612..8.614 rows=1434 loops=3)
                                                           Buckets: 8192  Batches: 1  Memory Usage: 288kB
                                                           ->  Parallel Hash Join  (cost=1896.43..4299.48 rows=1987 width=12) (actual time=6.271..24.967 rows=4301 loops=1)
                                                                 Hash Cond: (orders.or_id_pr = projects.id_pr)
                                                                 ->  Parallel Append  (cost=0.00..2280.84 rows=30382 width=16) (actual time=0.003..13.978 rows=72918 loops=1)
                                                                       ->  Parallel Seq Scan on orders_archiv orders_2  (cost=0.00..1826.94 rows=37594 width=16) (actual time=0.002..9.989 rows=63910 loops=1)
                                                                       ->  Parallel Seq Scan on orders orders_1  (cost=0.00..301.99 rows=5299 width=16) (actual time=0.003..1.404 rows=9008 loops=1)
                                                                 ->  Parallel Hash  (cost=1890.96..1890.96 rows=437 width=4) (actual time=6.247..6.248 rows=1047 loops=1)
                                                                       Buckets: 2048  Batches: 1  Memory Usage: 80kB
                                                                       ->  Parallel Append  (cost=0.00..1890.96 rows=437 width=4) (actual time=0.007..6.088 rows=1047 loops=1)
                                                                             ->  Parallel Seq Scan on projects_archiv projects_2  (cost=0.00..1725.90 rows=412 width=4) (actual time=3.954..5.498 rows=698 loops=1)
                                                                                   Filter: (pr_cli_code = 558)
                                                                                   Rows Removed by Filter: 78305
                                                                             ->  Parallel Seq Scan on projects projects_1  (cost=0.00..162.88 rows=205 width=4) (actual time=0.006..0.549 rows=349 loops=1)
                                                                                   Filter: (pr_cli_code = 558)
                                                                                   Rows Removed by Filter: 6434
                       ->  Hash  (cost=32.90..32.90 rows=890 width=16) (actual time=0.264..0.264 rows=890 loops=1)
                             Buckets: 1024  Batches: 1  Memory Usage: 52kB
                             ->  Seq Scan on clients_sub  (cost=0.00..32.90 rows=890 width=16) (actual time=0.013..0.166 rows=890 loops=1)
                 ->  Hash  (cost=1.34..1.34 rows=34 width=12) (actual time=0.016..0.016 rows=34 loops=1)
                       Buckets: 1024  Batches: 1  Memory Usage: 10kB
                       ->  Seq Scan on seasons  (cost=0.00..1.34 rows=34 width=12) (actual time=0.008..0.011 rows=34 loops=1)
   ->  Subquery Scan on "*SELECT* 1"  (cost=0.00..62.05 rows=1773 width=284) (actual time=80.021..80.031 rows=22 loops=1)
         ->  CTE Scan on qsum  (cost=0.00..35.46 rows=1773 width=236) (actual time=80.020..80.025 rows=22 loops=1)
   ->  Subquery Scan on "*SELECT* 2"  (cost=74.70..77.20 rows=200 width=198) (actual time=0.059..0.061 rows=12 loops=1)
         ->  Sort  (cost=74.70..75.20 rows=200 width=200) (actual time=0.058..0.059 rows=12 loops=1)
               Sort Key: qsum_1.of_season
               Sort Method: quicksort  Memory: 26kB
               ->  HashAggregate  (cost=62.05..67.06 rows=200 width=200) (actual time=0.033..0.050 rows=12 loops=1)
                     Group Key: qsum_1.of_season, qsum_1.sea_name
                     Batches: 1  Memory Usage: 48kB
                     ->  CTE Scan on qsum qsum_1  (cost=0.00..35.46 rows=1773 width=120) (actual time=0.001..0.003 rows=22 loops=1)
 Planning Time: 1.476 ms
 Execution Time: 82.863 ms
(66 Zeilen)
              name               |              current_setting               |        source        |
---------------------------------+--------------------------------------------+----------------------+
 application_name                | psql                                       | client               |
 auto_explain.log_analyze        | on                                         | configuration file   |
 auto_explain.log_min_duration   | 1ms                                        | configuration file   |
 client_encoding                 | WIN1252                                    | client               |
 cluster_name                    | CPS_PROD                                   | configuration file   |
 data_checksums                  | on                                         | override             |
 DateStyle                       | ISO, DMY                                   | configuration file   |
 default_text_search_config      | pg_catalog.german                          | configuration file   |
 dynamic_shared_memory_type      | windows                                    | configuration file   |
 effective_cache_size            | 4GB                                        | configuration file   |
 huge_pages                      | try                                        | configuration file   |
 lc_collate                      | C                                          | override             |
 lc_ctype                        | German_Germany.1252                        | override             |
 lc_messages                     | C                                          | configuration file   |
 lc_monetary                     | German_Germany.1252                        | configuration file   |
 lc_numeric                      | German_Germany.1252                        | configuration file   |
 lc_time                         | German_Germany.1252                        | configuration file   |
 listen_addresses                | *                                          | configuration file   |
 log_destination                 | stderr                                     | configuration file   |
 log_directory                   | L:/ZZ_log/pg_log_prod                      | configuration file   |
 log_file_mode                   | 0640                                       | configuration file   |
 log_line_prefix                 | CPS PRD %t %i %e %2l:>                     | configuration file   |
 log_statement                   | mod                                        | configuration file   |
 log_temp_files                  | 0                                          | configuration file   |
 log_timezone                    | CET                                        | configuration file   |
 logging_collector               | on                                         | configuration file   |
 maintenance_work_mem            | 512MB                                      | configuration file   |
 max_connections                 | 100                                        | configuration file   |
 max_stack_depth                 | 2MB                                        | environment variable |
 max_wal_size                    | 1GB                                        | configuration file   |
 min_wal_size                    | 80MB                                       | configuration file   |
 pg_prewarm.autoprewarm_interval | 1d                                         | configuration file   |
 pg_stat_statements.max          | 10000                                      | configuration file   |
 pg_stat_statements.track        | all                                        | configuration file   |
 random_page_cost                | 1                                          | configuration file   |
 search_path                     | public, archiv, ablage, admin              | database             |
 server_encoding                 | UTF8                                       | override             |
 server_version                  | 13.2                                       | default              |
 shared_buffers                  | 768MB                                      | configuration file   |
 shared_preload_libraries        | auto_explain,pg_stat_statements,pg_prewarm | configuration file   |
 temp_buffers                    | 128MB                                      | configuration file   |
 TimeZone                        | CET                                        | configuration file   |
 transaction_deferrable          | off                                        | override             |
 transaction_isolation           | read committed                             | override             |
 transaction_read_only           | off                                        | override             |
 update_process_title            | off                                        | configuration file   |
 wal_buffers                     | 16MB                                       | override             |
 wal_segment_size                | 16MB                                       | override             |
 work_mem                        | 64MB                                       | configuration file   |
(49 Zeilen)
                name                |              current_setting               |        source        |
------------------------------------+--------------------------------------------+----------------------+
 application_name                   | psql                                       | client               |
 archive_mode                       | on                                         | configuration file   |
 auto_explain.log_analyze           | on                                         | configuration file   |
 auto_explain.log_min_duration      | 0                                          | configuration file   |
 auto_explain.log_nested_statements | on                                         | configuration file   |
 client_encoding                    | UTF8                                       | client               |
 cluster_name                       | HB_LUX                                     | configuration file   |
 data_checksums                     | on                                         | override             |
 DateStyle                          | ISO, MDY                                   | configuration file   |
 default_text_search_config         | pg_catalog.english                         | configuration file   |
 dynamic_shared_memory_type         | posix                                      | configuration file   |
 effective_cache_size               | 4GB                                        | configuration file   |
 lc_collate                         | C                                          | override             |
 lc_ctype                           | en_US.UTF-8                                | override             |
 lc_messages                        | C                                          | configuration file   |
 lc_monetary                        | en_US.UTF-8                                | configuration file   |
 lc_numeric                         | en_US.UTF-8                                | configuration file   |
 lc_time                            | en_US.UTF-8                                | configuration file   |
 listen_addresses                   | *                                          | configuration file   |
 log_destination                    | stderr                                     | configuration file   |
 log_directory                      | /usr/local/hb/zz_log/pg_log_hblux          | configuration file   |
 log_line_prefix                    | LHB %a %t %i %e %2l:>                      | configuration file   |
 log_statement                      | mod                                        | configuration file   |
 log_temp_files                     | 0                                          | configuration file   |
 log_timezone                       | Europe/Berlin                              | configuration file   |
 logging_collector                  | on                                         | configuration file   |
 maintenance_work_mem               | 128MB                                      | configuration file   |
 max_connections                    | 20                                         | configuration file   |
 max_stack_depth                    | 2MB                                        | environment variable |
 max_wal_size                       | 1GB                                        | configuration file   |
 min_wal_size                       | 80MB                                       | configuration file   |
 pg_prewarm.autoprewarm             | off                                        | configuration file   |
 pg_stat_statements.max             | 8000                                       | configuration file   |
 pg_stat_statements.track           | all                                        | configuration file   |
 random_page_cost                   | 1                                          | configuration file   |
 search_path                        | public, archiv, ablage, admin              | database             |
 server_encoding                    | UTF8                                       | override             |
 server_version                     | 13.2                                       | default              |
 shared_buffers                     | 512MB                                      | configuration file   |
 shared_preload_libraries           | auto_explain,pg_stat_statements,pg_prewarm | configuration file   |
 temp_buffers                       | 32MB                                       | configuration file   |
 TimeZone                           | Europe/Berlin                              | configuration file   |
 transaction_deferrable             | off                                        | override             |
 transaction_isolation              | read committed                             | override             |
 transaction_read_only              | off                                        | override             |
 update_process_title               | off                                        | configuration file   |
 wal_buffers                        | 16MB                                       | override             |
 wal_compression                    | on                                         | configuration file   |
 wal_segment_size                   | 16MB                                       | override             |
 work_mem                           | 64MB                                       | configuration file   |
(50 rows)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]

  Powered by Linux