Re: Huge Tables

Rick Otten <rottenwindfish@xxxxxxxxx> · Mon, 13 Mar 2023 08:45:14 -0400

300M rows isn't "huge", but it is starting to get to be real data.
Some notes/very general rules of thumb since you asked a very general question:
1.  Consider updating the statistics on the table from the default sample of 100 rows to something larger - especially if you have a wide variety of data. (either set on a per-table basis or set globally on your database with the `default_statistics_target` parameter.
2.  Consider the `create statistics` command to see if there any other additional hints you can give the planner to help figure out if columns are related.
3.  If you partition:
    a.  Your queries could be _slower_ if they don't include the partition criteria.  So partition on something you are likely to almost always want to filter on anyhow.  That way you can take advantage of "partition pruning".
    b.  One of the main advantages of partitioning is to be able to archive old data easily - either by moving it to other tables, dropping it, or doing other things with it.  Think about whether you ever intend to roll out old data and figure out ways partitions might make that easier.
4.  Consider tweaking `max_parallel_workers` to enable more concurrency if you are running a lot of big queries on your larger table.
    a.  There are a number of other `*parallel*` parameters you can study and tune as well.
5.  Consider bumping `work_mem` if you are running queries that are doing a lot of sorting and other intermediary work on the larger data sets.
6.  For a table with only 300M rows, btree is going to be fine for most use cases.  If you have a monotonically increasing/decreasing column you may be able to use a BRIN index on it to save a little space and make for slightly more efficient query.
7.  You may want to tweak the vacuum parameters to be able to use a little more memory and more parallel processing.  Since autovacuums are triggered by a percentage of change in the table, you may want to lower the percentage of rows that trigger the vacuums.

You'll need to get a lot more specific about the issues you are running into for us to be able to provide more specific recommendations

On Sat, Mar 11, 2023 at 6:48 AM André Rodrigues <db.andre@xxxxxxxxx> wrote:
Hello Guys 
Regarding a particular performance + architecture situation with postgres 12, I have a table with 300 millions rows and then I ask you, which basic approach like parameters in postgres.conf, suitable index type , partitions type,  would you suggest me knowing that we have Queries  using  bind with range id  ( primary Key )  +  1 or 2 columns ? 

Best regards
Andre