Re: Postgres performance comparing GCP and AWS

Milos Babic <milos.babic@xxxxxxxxx> · Wed, 24 Feb 2021 14:15:36 +0100

Hi Maurici,
as a starting point: can you make sure your GPC instance is configured in the same way AWS is?
Once you do it, repeat the tests, and post the outcome.

Thanks,
Milos

On Tue, Feb 23, 2021 at 11:14 PM Maurici Meneghetti <maurici.meneghetti@xxxxxxxxxxxxxxxxxxxx> wrote:
Hi everyone,

I have 2 postgres instances created from the same dump (backup), one on a GCP VM and the other on AWS RDS. The first instance takes 18 minutes and the second one takes less than 20s to run this simples query:
SELECT "Id", "DateTime", "SignalRegisterId", "Raw" FROM "SignalRecordsBlobs" WHERE "SignalSettingId" = 103 AND "DateTime" BETWEEN '2019-11-28T14:00:12.540200000' AND '2020-07-23T21:12:32.249000000';
I’ve run this query a few times to make sure both should be reading data from cache.
I expect my postgres on GPC to be at least similar to the one managed by AWS RDS so that I can work on improvements parallelly and compare.

DETAILS:
Query explain for Postgres on GCP VM:
Bitmap Heap Scan on SignalRecordsBlobs SignalRecordsBlobs  (cost=18.80..2480.65 rows=799 width=70) (actual time=216.766..776.032 rows=5122 loops=1)
    Filter: (("DateTime" >= \'2019-11-28 14:00:12.5402\'::timestamp without time zone) AND ("DateTime" <= \'2020-07-23 21:12:32.249\'::timestamp without time zone))
    Heap Blocks: exact=5223
    Buffers: shared hit=423 read=4821
  ->  Bitmap Index Scan on IDX_SignalRecordsBlobs_SignalSettingId  (cost=0.00..18.61 rows=824 width=0) (actual time=109.000..109.001 rows=5228 loops=1)
          Index Cond: ("SignalSettingId" = 103)
          Buffers: shared hit=3 read=18
Planning time: 456.315 ms
Execution time: 776.976 ms

Query explain for Postgres on AWS RDS:
Bitmap Heap Scan on SignalRecordsBlobs SignalRecordsBlobs  (cost=190.02..13204.28 rows=6213 width=69) (actual time=2.215..14.505 rows=5122 loops=1)
    Filter: (("DateTime" >= \'2019-11-28 14:00:12.5402\'::timestamp without time zone) AND ("DateTime" <= \'2020-07-23 21:12:32.249\'::timestamp without time zone))
    Heap Blocks: exact=5209
    Buffers: shared hit=3290 read=1948
  ->  Bitmap Index Scan on IDX_SignalRecordsBlobs_SignalSettingId  (cost=0.00..188.46 rows=6405 width=0) (actual time=1.159..1.159 rows=5228 loops=1)
          Index Cond: ("SignalSettingId" = 103)
          Buffers: shared hit=3 read=26
Planning time: 0.407 ms
Execution time: 14.87 ms

PostgreSQL version number running:
•	VM on GCP: PostgreSQL 11.10 (Debian 11.10-0+deb10u1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 8.3.0-6) 8.3.0, 64-bit
•	Managed by RDS on AWS: PostgreSQL 11.10 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-11), 64-bit

How PostgreSQL was installed:
•	VM on GCP: Already installed when created VM running Debian on Google Console. 
•	Managed by RDS on AWS: RDS managed the installation.

Changes made to the settings in the postgresql.conf file:
Here are some postgres parameters that might be useful:
Instance on VM on GCP (2 vCPUs, 2 GB memory, 800 GB disk):
•	effective_cache_size: 1496MB
•	maintenance_work_mem: 255462kB (close to 249MB)
•	max_wal_size: 1GB
•	min_wal_size: 512MB
•	shared_buffers: 510920kB (close to 499MB)
•	max_locks_per_transaction 1000
•	wal_buffers: 15320kB (close to 15MB)
•	work_mem: 2554kB
•	effective_io_concurrency: 200
•	dynamic_shared_memory_type: posix
On this instance we installed a postgres extension called timescaledb to gain performance on other tables. Some of these parameters were set using recommendations from that extension.

Instance managed by RDS (2 vCPUs, 2 GiB RAM, 250GB disk, 750 de IOPS):
•	effective_cache_size: 1887792kB (close to 1844MB)
•	maintenance_work_mem: 64MB
•	max_wal_size: 2GB
•	min_wal_size: 192MB
•	shared_buffers: 943896kB (close to 922MB)
•	max_locks_per_transaction 64

Operating system and version by runing "uname -a":
•	VM on GCP: Linux {{{my instance name}}} 4.19.0-14-cloud-amd64 #1 SMP Debian 4.19.171-2 (2021-01-30) x86_64 GNU/Linux
•	Managed by AWS RDS: Aparently Red Hay as shown using SELECT version();

Program used to connect to PostgreSQL: Python psycopg2.connect() to create the connection and pandas read_sql_query() to query using that connection.
Thanks in advance