500M rows doesn't have to be a lot of records. Are the tuples large? If I were to partition those tables, I would partition them on an existing PK field.
Until then, I would:
- disable AUTOVACUUM on those tables immediately before the ETL job starts
- run the ETL job
- "manually" run VACUUM ANALYZE on those tables.
- enable AUTOVACUUM on those tables
On Mon, Jan 20, 2025 at 6:07 AM Siraj G <tosiraj.g@xxxxxxxxx> wrote:
Hello Guillaume!As I highlighted the records count for these tables which are quite high, would it be a best practice if we change the vacuum and analyze scale factor at the table level?Also, I am trying to understand if partitioning is required for these tables, or at least for the one which has over 500million records?RegardsSirajOn Mon, Jan 20, 2025 at 3:04 PM Guillaume Lelarge <guillaume@xxxxxxxxxxxx> wrote:Hi,Le lun. 20 janv. 2025 à 09:42, Siraj G <tosiraj.g@xxxxxxxxx> a écrit :Hello Experts!We had a performance issue with a SQL that used to complete in a few milliseconds, was taking over 14seconds. We had to run analyze on 3 tables to get the idle performance back.When the performance was not optimal, we noticed sequential scans even with indexes created.The tables and their count:coverage_details = 529628595
customer_details = 81721669
policy_details = 116909729PgSQL version is:PostgreSQL 15.7 on x86_64-pc-linux-gnu, compiled by Debian clang version 12.0.1, 64-bitOne more information is that we noticed this started happening (in the destination) after an ETL job completed the load (regular load). Just wanted to know if any follow up actions we should do after such data loads, eg., analyze or vacuum. We do have autovacuum on, with default values.Yes, you should run "VACUUM ANALYZE" after running a batch. autovacuum could be not fast enough to do it itself before you start querying the new data.
--Guillaume.
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!