Re: Batch insert heavily affecting query performance.

MichaelDBA <MichaelDBA@xxxxxxxxxxx> · Sun, 24 Dec 2017 20:30:27 -0500

Yes it would/does make a 
difference!  When you do it with one connection you should see a big 
performance gain.  Delayed, granted, extend locks (locktype=extend) can 
happen due to many concurrent connections trying to insert into the same
 table at the same time. Each insert request results in an extend lock 
(8k extension), which blocks other writers. What normally happens is the
 these extend locks happen so fast that you hardly ever see them in the 
pg_locks table, except in the case where many concurrent connections are
 trying to do a lot of inserts into the same table. The following query 
will show if this is the case:

select * from pg_locks where granted = false and locktype = 'extend';

   	Jean Baro
        Sunday,
 December 24, 2017 7:09 PM

  Multiple 
connections, but we are going to test it with only one. Would it make 
any difference?
Thanks 

   	MichaelDBA@xxxxxxxxxxx
        Sunday,
 December 24, 2017 6:52 PM

  Are the inserts being done
 through one connection or multiple connections concurrently?

Sent
 from my iPhone

   	Jean Baro
        Sunday,
 December 24, 2017 2:51 PM

  Hi there,
We are testing a new application to
 try to find performance issues.

AWS RDS m4.large 500GB storage (SSD)

One table only, called Messages:

Uuid
Country  (ISO)
Role (Text)
User id  (Text)
GroupId (integer)
Channel (text)
Title (Text)
Payload (JSON, up to 20kb)
Starts_in (UTC)
Expires_in (UTC)
Seen (boolean)
Deleted (boolean)
LastUpdate (UTC)
Created_by (UTC)
Created_in (UTC)

Indexes:

UUID 
(PK)
UserID + Country (main index)
LastUpdate 
GroupID 

We 
inserted 160MM rows, around 2KB each. No partitioning.

Insert started at around  3.000 
inserts per second, but (as expected) started to slow down as the number
 of rows increased.  In the end we got around 500 inserts per second.

Queries by Userd_ID + Country took
 less than 2 seconds, but while the batch insert was running the queries
 took over 20 seconds!!!

We
 had 20 Lambda getting messages from SQS and bulk inserting them into 
Postgresql. 

The insert 
performance is important, but we would slow it down if needed in order 
to ensure a more flat query performance. (Below 2 seconds). Each query 
(userId + country) returns around 100 diferent messages, which are 
filtered and order by the synchronous Lambda function. So we don't do 
any special filtering, sorting, ordering or full text search in 
Postgres. In some ways we use it more like a glorified file system. :)

We are going to limit the number 
of lambda workers to 1 or 2, and then run some queries concurrently to 
see if the query performance is not affect too much. We aim to get at 
least 50 queries per second (returning 100 messages each) under 2 
seconds, even when there is millions of messages on SQS being inserted 
into PG.

We haven't done 
any performance tuning in the DB. 

With all that said, the question is:

What can be done to ensure good query performance (UserID+ 
country) even when the bulk insert is running (low priority).

We are limited to use AWS RDS at 
the moment.

Cheers