High level summary: server ram has a significant impact on batch processing performance (no surprise), and AWS processing can largely compete with local servers IF the AWS network connection is optimized. With the recent threads about insert performance (and performance in general), I thought I'd share some numbers that could assist some other Postgres users in planning their environments. I am currently running a Postgres dev server in AWS and we are evaluating a high powered physical server for our data center, for which we received a demo unit from Cisco for testing. Great opportunity to test a range of pinch points that could restrict scalability and performance, comparing how 2 very different servers behave under a high bulk loading/transform scenario. The scenario is that I'm migrating mysql data ("v1", eventually 20tb of genomics data) over to a new Postgres server ("v2"). [As a side note, I'm attempting to get a third server spun up, being a high powered AWS EC2 instance (an r3.4xlarge with 122gb ram, 16 cores, 6tb SSD EBS Optimized with 16k guaranteed IOPS). When I finish the testing against the 3rd server, I'll report again.] Landscape: Source mysql server: Dell physical 24 cores at 2.8ghz, 32gb ram, 1gbe networking, Percona/mysql v5.5.3 on linux in our data center AWS: EC2 m4.xlarge instance with 16 gb ram, 4 cores at 2.4ghz, 3tb SSD. PG v9.5.1 on Red Hat 4.8.5-4 64 bit, on a 10gb Direct Connect link from our data center to. Cisco: Hyperflex HX240c M4 node with UCS B200 M4 blade, with 256gb ram, 48 cores at 2.2ghz, 4tb direct attached Intel flash (SSD) for the OS, 10tb of NetApp Filer SSD storage via 4gb HBA cards. PG v9.5.1 on Red Hat 4.8.5-4 64 bit, 10gbe networking but has to throttle down to 1gbe when talking to the mysql source server. PASS 1: Process: Extract (pull the raw v1 data over the network to the 32 v2 staging tables) Num Source Rows: 8,232,673 (Small Test) Rowcount Compression: 1.0 (1:1 copy) AWS Time in Secs: 1,516** Cisco Time in Secs: 376 Difference: 4.0x Comment: AWS: 5.7k rows/sec cisco: 21.9k rows/sec (**network speed appears to be the factor, see notes below) Process: Transform/Load (all work local to the server - read, transform, write as a single batch) Num Source Rows: 5,575,255 (many smaller batches from the source tables, all writes going to a single target table) Avg Rowcount Compression: 10.3 (jsonb row compression resulting in 10x fewer rows) AWS Time in Secs: 408 Cisco Time in Secs: 294 Difference: 1.4x (the Cisco is 40% faster...not a huge difference) Comment:AWS: 13.6k rows/sec Cisco: 19k rows/sec Notes: The testing has revealed an issue with the networking in our data center, which appears to be causing abnormally slow transfer speed to AWS. That is being investigated. So if we look at just the Transform/Load process, we can see that both AWS and the local Cisco server have comparable processing speeds on the small dataset. However, when I moved to a medium sized dataset of 204m rows, a different pattern emerged. I'm including just the Transform/Load process here, and testing just ONE table out of the batch: PASS 2: Process: Transform/Load (all work local to the server - read, transform, write as a single batch) Num Source Rows: 10,554,800 (one batch from just a single source table going to a single target table) Avg Rowcount Compression: 31.5 (jsonb row compression resulting in 31.5x fewer rows) AWS Time in Secs: 2,493 (41.5 minutes) Cisco Time in Secs: 661 (10 minutes) Difference: 3.8x Comment:AWS: 4.2k rows/sec Cisco: 16k rows/sec It's obvious the size of the batch exceeded the AWS server memory, resulting in a profoundly slower processing time. This was a true, apples to apples comparison between Pass 1 and Pass 2: average row lengths were within 7% of each other (1121 vs 1203) using identical table structures and processing code, the only difference was the target server. I'm happy to answer questions about these results. Mike Sofen (USA) -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance