On 2024-02-04 02:14:20 +0530, Lok P wrote: > However , as we have ~5billion rows in the base table and out of that , we were > expecting almost half i.e. ~2billion would be duplicates. That's way more than I expected from your original description. And it of course raises the question whether it's a good idea to just throw away all that data or if you need to keep that in a normalized way. > And you said, doing the inserts using the "order by CTID Offset" > approach must cause one full sequential scan of the whole table for > loading each chunk/10M of rows and that would take a long time I > believe. > > I am still trying to understand the other approach which you suggested. Not > able to understand "you can select where index_col > last order by index_col > limit 10M," . > However, to get the max ID value of the last 10M loaded rows in target, do you > say that having an PK index created on that target table column(ID) will > help, Yes. Getting the maximum value from an index is a very fast operation. You just have to traverse down the right edge of the tree (or you may even be able to access the right-most leaf page directly). > and we can save the max (ID) value subsequently in another table to fetch > and keep loading from the source table (as ID>Max_ID stored in temp table)? Another table or a variable in a script (personally, if I need to do something repeatedly, I usually write a script in the scripting language I feel most comfortable in (which has been Python for the last 7 or 8 years, Perl before that) which gives you variables, loops, conditionals and - above all - repeatability. > OR > Would it be better to do it in one shot only , but by setting a higher value of > some parameters like "maintenance_work_mem" or "max_parallel_workers"? Hard to say. Normally, processing in fewer. bigger chunks is faster. But RAM is much faster than disk (even with SSDs), so it might be faster to make work_mem as large as you can and then use a chunk size which just fits inside work_mem is faster. Of course finding that sweet spot takes experimentation, hence time, and it may make little sense to experiment for 20 hours just to save 40 minutes. hp -- _ | Peter J. Holzer | Story must make more sense than reality. |_|_) | | | | | hjp@xxxxxx | -- Charles Stross, "Creative writing __/ | http://www.hjp.at/ | challenge!"
Attachment:
signature.asc
Description: PGP signature