Re: Half billion records in one table? RDS

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Why not store metadata in pg and the payload in S3? 

On Mon, Nov 27, 2017 at 11:58 AM Jean Baro <jfbaro@xxxxxxxxx> wrote:
Hi there,

We are creating a new DB which will behave most like a file system, I mean, there will be no complex queries or joins running in the DB. The idea is to grab the WHOLE set of messages for a particular user and then filter, order, combine or full text search in the function itself (AWS Lambda). The maximum number of messages is limited to 1.000 messages per user. So we expect Postgres to have an amazing performance for this scenario.

As I am not really familiar with PG (9.6, or 10, in case RDS release it before February) I would like to share what we are planning to do for this DB. So if you guys could share your thoughts, that would be great! :)

Table structure:

 

·         MessageID (UUID) - PK

·         UserCountry (ISO)

·         UserRole (TEXT 15)

·         UserID (TEXT 30) – FK (although there is no constraint)

·         LifeCycle (RANGE DATE? Or 2 TimeStampWithTZ? Start_date and end_date?)

·         Channel (TEXT 15)

·         Tags (TEXT 2000)

·         Menu (TEXT 200)

·         Icon (TEXT 500) – URL to an image which will be used as an icon;

·         Title (TEXT 150)

·         Body (JSON – up to 10K) – Meta data describing all the data to a specific type of message. The JSON changes according to the type of message. We are assuming most messages will use less than 1K for this field.

·         Delete (BOOLEAN) – Soft Delete

·         Created (Timestamp – With TZ)

·         CreatedBy (TEXT 50)

 

Only 1 table

·         Messages

3 indexes:

·         MessageID PK (UUID)

·         Main fetch key (UserCountry + UserID) - *****

·         End_date (To locate old messages that can be moved to another DB - which will hold the old messages);

 

Sizing and worst case scenario:

 

·         500MM messages in the main DB

·         4K queries per second (by UserID) – Max time of 500ms per query. Simples SELECT, with no ORDER, WHERE OR GROUP BY. Just grab all the messages for a particular user. MAX 1000 messages per USER.

·         1K inserts per second on average (So that in 1 hour we can insert around 3MM messages)

·         1K deletes per second on average (So that in 1 hour we can remove around 3MM messages)


My question is:

  • Can we use any kind of compression for PostgreSQL which would result in reduced IO and disk size?
  • We are not relying on any kind of table partitioning, is that the best approach for this scenario?
  • Is PG on RDS capable of delivering this type of performance while requiring low maintenance?
  • What about Auto Vacuum? Any suggestion how to optimize it for such a work load (we will insert and delete millions of rows every day).
P.S.: We are going to test all this, but if we don't get the performance we are expecting, all optimization tips from you guys will be really appreciated. :)

Thanks



--

Regards,
/Aaron

[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux