Table structure:
· MessageID (UUID) - PK
· UserCountry (ISO)
· UserRole (TEXT 15)
· UserID (TEXT 30) – FK (although there is no constraint)
· LifeCycle (RANGE DATE? Or 2 TimeStampWithTZ? Start_date and end_date?)
· Channel (TEXT 15)
· Tags (TEXT 2000)
· Menu (TEXT 200)
· Icon (TEXT 500) – URL to an image which will be used as an icon;
· Title (TEXT 150)
· Body (JSON – up to 10K) – Meta data describing all the data to a specific type of message. The JSON changes according to the type of message. We are assuming most messages will use less than 1K for this field.
· Delete (BOOLEAN) – Soft Delete
· Created (Timestamp – With TZ)
· CreatedBy (TEXT 50)
Only 1 table
· Messages
3 indexes:
· MessageID PK (UUID)
· Main fetch key (UserCountry + UserID) - *****
· End_date (To locate old messages that can be moved to another DB - which will hold the old messages);
Sizing and worst case scenario:
· 500MM messages in the main DB
· 4K queries per second (by UserID) – Max time of 500ms per query. Simples SELECT, with no ORDER, WHERE OR GROUP BY. Just grab all the messages for a particular user. MAX 1000 messages per USER.
· 1K inserts per second on average (So that in 1 hour we can insert around 3MM messages)
· 1K deletes per second on average (So that in 1 hour we can remove around 3MM messages)
My question is:
- Can we use any kind of compression for PostgreSQL which would result in reduced IO and disk size?
- We are not relying on any kind of table partitioning, is that the best approach for this scenario?
- Is PG on RDS capable of delivering this type of performance while requiring low maintenance?
- What about Auto Vacuum? Any suggestion how to optimize it for such a work load (we will insert and delete millions of rows every day).
Thanks