That’s crazy only having 8GB memory when you have tables with over 100GBs. One general rule of thumb is have enough memory to hold the biggest index. Sent from my iPad > On Jan 9, 2023, at 3:23 AM, spiral <spiral@xxxxxxxxx> wrote: > > Hello, > > We have a table containing ~1.75 billion rows, using 170GB storage. > The table schema is the following: > > messages=# \d messages > Table "public.messages" > Column | Type | Collation | Nullable | Default > --------------+---------+-----------+----------+--------- > mid | bigint | | not null | > channel | bigint | | not null | > member | integer | | | > sender | bigint | | not null | > original_mid | bigint | | | > guild | bigint | | | > Indexes: > "messages_pkey" PRIMARY KEY, btree (mid) > > > This table is used essentially as a key-value store; rows are accessed > only with `mid` primary key. Additionally, inserted rows may only be > deleted, but never updated. > > We only run the following queries: > - INSERT INTO messages VALUES (...data...); > - SELECT * FROM messages WHERE mid = $1; > - DELETE FROM messages WHERE mid = $1; > - DELETE FROM messages WHERE mid IN ($1...$n); > - SELECT count(*) FROM messages; > > For the "IN" query, it is possible for there to be up to 100 > parameters, and it is possible that none of them will match an existing > row. > > So, the problem: I don't know how to best store this data in > postgres, or what system requirements would be needed. > Originally, this table did not contain a substantial amount of data, > and so I stored it in the same database as our CRUD user data. However, > as the table became larger, cache was being allocated to (mostly > unused) historical data from the `messages` table, and I decided to > move the large table to its own postgres instance. > > At the same time, I partitioned the table, with TimescaleDB's automatic > time-series partitioning, because our data is essentially time-series > (`mid` values are Twitter-style snowflakes) and it was said that > partitioning would improve performance. > This ended up being a mistake... shared_buffers memory usage went way > up, from the 20GB of the previous combined database to 28GB for just > the messages database, and trying to lower shared_buffers at all made > the database start throwing "out of shared memory" errors when running > DELETE queries. A TimescaleDB update did improve this, but 28GB is way > more memory than I can afford to allocate to this database - instead of > "out of shared memory", it gets OOM killed by the system. > > What is the best course of action here? > - Ideally, I would like to host this database on a machine with 4 > (Ryzen) cores, 8GB RAM, and tiered storage (our cloud provider doesn't > support adding additional local storage to a VPS plan). Of course, > this seems very unrealistic, so it's not a requirement, but the > closer we can get to this, the better. > - Is it a good idea to use table partitioning? I heard advice that one > should partition tables with above a couple million rows, but I don't > know how true this is. We have a table with ~6mil rows in our main > database that has somewhat slow lookups, but we also have a table > with ~13mil rows that has fast lookups, so I'm not sure. > > Thanks > spiral > >