Hi Tom. Good point about the index paging out of the buffer. I did that and no change. I do have the shared buffers at 40GB, so there’s a good bit there, but I also did all those things on the page you referred, except for using copy. At
this point the data has not been scrubbed, so I’m trapping data errors and duplicates. I am curios though, as sidebar, why copy is considered faster than inserts. I was unable to get COPY faster than around 25K inserts a second (pretty fast anyway). Frankly,
initially I was running 3 concurrent insert jobs and getting 90K ins/sec ! but after a certain number of records, the speed just dropped off.
pg
From: Tom Lane <tgl@xxxxxxxxxxxxx>
Sent: Wednesday, November 24, 2021 1:32 PM
To: Godfrin, Philippe E <Philippe.Godfrin@xxxxxxx>
Cc: pgsql-general@xxxxxxxxxxxxxxxxxxxx
Subject: [EXTERNAL] Re: Inserts and bad performance
"Godfrin, Philippe E" <Philippe.Godfrin@xxxxxxx> writes:
> I am inserting a large number of rows, 5,10, 15 million. The python code commits every 5000 inserts. The table has partitioned children.
> At first, when there were a low number of rows inserted, the inserts would run at a good clip - 30 - 50K inserts per second. Now, after inserting oh say 1.5 Billion rows, the insert rate has dropped to around 5000 inserts per second. I dropped the unique
index , rebuilt the other indexes and no change. The instance is 16 vcpu and 64GB ram.
Can you drop the indexes and not rebuild them till after the bulk load is
done? Once the indexes exceed available RAM, insert performance is going
to fall off a cliff, except maybe for indexes that are receiving purely
sequential inserts (so that only the right end of the index gets touched).
Also see
https://www.postgresql.org/docs/current/populate.html
regards, tom lane