On 4/6/24 08:47, yudhi s wrote:
Thank you Adrian, Greg and Veem.
I tried writing a small routine to see how the performance differs in
these four approaches i.e. Upsert VS traditional update+insert VS Merge
vs Truncate+load.
Initially I was thinking Upsert will perform the same as Merge as the
logic looks similar but it seems it's the worst performing among all,
not sure why , yet to know the reason though. Truncate+ load seems to be
the best performing among all. Hope i am doing it correctly. Please
correct me if I'm wrong.
Your original problem description was:
"Then subsequently these rows will be inserted/updated based on the
delta number of rows that got inserted/updated in the source database.
In some cases these changed data can flow multiple times per day to the
downstream i.e. postgres database and in other cases once daily."
If the above is not a hard rule, then yes up to some point just
replacing the data in mass would be the simplest/fastest method. You
could cut a step out by doing something like TRUNCATE target_tab and
then COPY target_tab FROM 'source.csv' bypassing the INSERT INTO source_tab.
--
Adrian Klaver
adrian.klaver@xxxxxxxxxxx