Re: How to do faster DML

veem v <veema0000@xxxxxxxxx> · Tue, 6 Feb 2024 11:25:05 +0530

On Tue, 6 Feb 2024 at 10:45, Lok P <loknath.73@xxxxxxxxx> wrote:
Thank you Greg.

"and keeping all your active stuff in cache. Since you have 127 columns, only pull back the columns you absolutely need for each query."

Understood the point. As postgres is a "row store" database, so keeping the size of the row lesser by making the number of columns to minimum, should definitely help as more number of rows can be packed then into one page or block (which I believe is 8KB default size per block). And that mean more number of blocks can be cached while reading, so better cache hit ratio.

As you rightly pointed out, Another thing I noticed the shared_buffer parameters set as 2029684 in this instance, which comes to ~21MB and that seems very small for a database operating in large scale. And I see we have RAM in the instance showing as ~256GB. So thinking of bumping it to something as ~40-50GB. Hope that will help to some extent. Not sure if there is methods to manually,  cache some objects(tables/indexes) which were getting used frequently by the read queries.

On Tue, Feb 6, 2024 at 10:27 AM Greg Sabino Mullane <htamfids@xxxxxxxxx> wrote:

So when you said "In that case, and based on the numbers you provided, daily partitioning seems a decent solution."
, does it mean that size of the partition (which will be ~100-150GB per partition here) is not an issue for the postgres to serve read latency sensitive queries?

Yes, that should be fine. At the end of the day, what's going to be more important is making sure you are using good indexing, and keeping all your active stuff in cache. Since you have 127 columns, only pull back the columns you absolutely need for each query.

Cheers,
Greg

Good discussion!! Don't want to divert the thread. And apology for comparing with another RDBMS, but curious to know the difference in behaviour.

With respect to the storage part:- In Oracle database we were supposed to keep the frequently accessed column first and less frequently accessed columns towards last in the table while creating table DDL. This used to help the query performance while accessing those columns as it's a row store database. Are there any such performance implications exists in postgres? And there the data types like varchar used to only occupy the space which the real data/value contains. 

But here in postgres, it seems a bit different as it has significant overhead in storage, as these seem like being fixed length data types and will occupy the space no matter what the read data holds. So curious to know, if in this case will there be any performance impact accessing those columns, if such a column which is frequently accessed but has been put towards the end of the table because of storage space saving?