Hi there,
We are testing a new application to
try to find performance issues.
AWS RDS m4.large 500GB storage (SSD)
One table only, called Messages:
Uuid
Country (ISO)
Role (Text)
User id (Text)
GroupId (integer)
Channel (text)
Title (Text)
Payload (JSON, up to 20kb)
Starts_in (UTC)
Expires_in (UTC)
Seen (boolean)
Deleted (boolean)
LastUpdate (UTC)
Created_by (UTC)
Created_in (UTC)
Indexes:
UUID
(PK)
UserID + Country (main index)
LastUpdate
GroupID
We
inserted 160MM rows, around 2KB each. No partitioning.
Insert started at around 3.000
inserts per second, but (as expected) started to slow down as the number
of rows increased. In the end we got around 500 inserts per second.
Queries by Userd_ID + Country took
less than 2 seconds, but while the batch insert was running the queries
took over 20 seconds!!!
We
had 20 Lambda getting messages from SQS and bulk inserting them into
Postgresql.
The insert
performance is important, but we would slow it down if needed in order
to ensure a more flat query performance. (Below 2 seconds). Each query
(userId + country) returns around 100 diferent messages, which are
filtered and order by the synchronous Lambda function. So we don't do
any special filtering, sorting, ordering or full text search in
Postgres. In some ways we use it more like a glorified file system. :)
We are going to limit the number
of lambda workers to 1 or 2, and then run some queries concurrently to
see if the query performance is not affect too much. We aim to get at
least 50 queries per second (returning 100 messages each) under 2
seconds, even when there is millions of messages on SQS being inserted
into PG.
We haven't done
any performance tuning in the DB.
With all that said, the question is:
What can be done to ensure good query performance (UserID+
country) even when the bulk insert is running (low priority).
We are limited to use AWS RDS at
the moment.
Cheers