Jack: On Sun, 14 Apr 2024 at 18:20, jack <jack4pg@xxxxxxx> wrote: > The full error reads: > server closed the connection expectantly > This probably means the server terminated abnormally > before or while processing the request. > error: connection to server was lost This is a CLIENT-side error. What people are asking you is for the corresponding error in the SERVER log. When the server has an severe error, ( probably a resource exhaustion, which normally leads to abnormal termination as it has been pointed out previously ) the connection gets closed, and you should go to the server to look at the eigen cause. > PostgreSQL 16.2 > I also believe it is a resource issue which can be rectified with a setting, but which setting? Not all resource issues can be solved, sometimes one asks for something which is too much for the available hardware. Anyway, as pointed out, posting the configuration, the server error, and maybe monitoring memory ( the usual culprit ) with top may help some one. Look for memory because a common cause of this kind of things in Linux is having overcommit on ( overcommit is nice, but basically it lies to postgres, it tells the server she can have X Gb and then when it tries to use them kills -9 it ) > If you were updating 100 million records what settings would you adjust? >From what you told earlier ( loading from CSV ) and with .... > Here are the updates I am performing on the 100 million records: > UPDATE table SET category_modified = UPPER(category); > UPDATE table SET category_modified = REGEXP_REPLACE(REPLACE(REPLACE(category_modified, '''','-'), '`', '-'), '\s{2,}', ' ', 'g') WHERE AND LENGTH(category_modified)>1 AND POSITION('--' IN category_modified)>0; > UPDATE table SET category_modified = REPLACE(category_modified,' ','-'); > UPDATE table SET category_modified = CASE WHEN category_modified IS NOT NULL THEN regexp_replace(category_modified, '[^a-zA-Z]$', '') ELSE NULL END; > UPDATE table SET category_modified = regexp_replace(category_modified, '-{2,}', '-', 'g'); > UPDATE table SET category_modified = SUBSTRING(category_modified FROM 1 FOR LENGTH(category_modified) - 1) WHERE LENGTH(category_modified)>1 AND category_modified LIKE '%-'; this kind of updates I would recommend stream-filtering the data on load. I do not know how it does it presently, but those are six hundred million updates ( unless some are supressed, which is what I do not remember now ), which makes for huge transactions and is asking for problems ( or maybe only 100_000_000 if you are autocomitting ). If postgres does not supress redundant updates ( lots of people can answer that ) I would start by conditioning the updates ( UPDATE table SET category_modified = UPPER(category) where category_modified is not null and category_modified <> UPPER(category); ), this can shave time and resource usages. But if your updates are of this kind ( transformation of data on the current row ( your sample is on a single value of a row, even easier ) without touching other things in the database ), I would encourage to use a small perl/awk/python/whatever program to filter and transform the data before loading it. It has several advantages, one is speed, other is minimizing server load, other is testability ( pipe sample lines to the filter, check result, when satisfied pipe full file and pipe result to psql ), and choosing the right language ( I would say perl, it was dessigned practically for this sort of things ) makes the filtering really easy. And it seems you are forcing it, I see null handling in some updates, but not others, this hints off relying on strict behaviour ( functions returning null on null input ). The only apparent difficulty is parsing csv, but there are millions of modules parsing the one understood by postgres ( if you generate your data, the default text format for postgres is much easier to parse, i.e., in perl normally a simple split// call does the trick ). Doing it with a filtering stream would allow you to easily process gigabytes of data using a few megabytes of ram in the client. I'm not sure about the server, but stream filtering lends itself to very easy batching of copies, and from what I read your server is beefy. Francisco Olarte.