Re: UPDATE many records

Justin <zzzzz.graf@xxxxxxxxx> · Mon, 6 Jan 2020 15:25:53 -0500

As you have access to Procedure,   you can create a loop then issue an Begin Update Commit
so something like this should work plpgsql

declare
 icount int = 0;
 new_count int = 0;

begin 

select count(*) into icount from mytable;

loop 
  begin ; 
    Update mytable set myvalue = newvalue() where id  between new_count  and new_count+9999  ; 
   commit;
    new_count = new_count + 10,000;
   if new_count > icount  then
      break
   end if;
end loop;
end;

I am going to put caveat into this,  if newvalue() function is complex and takes allot of "CPU cycles to do its thing" then parallelism would help, unless this function looks at the table being updated it can really complicate things as the parallel functions would be looking at stale records which could be bad...

On Mon, Jan 6, 2020 at 3:07 PM Israel Brewster <ijbrewster@xxxxxxxxxx> wrote:
Good information. I did forget to mention that I am using PostgreSQL 11.5. I also was not aware of the distinction between PROCEDURE and FUNCTION, so I guess I used the wrong terminology there when stating that new_value is the result of a stored procedure. It’s actually a function.
So would your suggestion then be to create a procedure that loops through the records, calculating and committing each one (or, as in your older Postgres example, batches of 10k to 20k)?

Good point on the HD I/O bound vs processor bound, but wouldn’t that depend on how complicated the actual update is? Still, there is a good chance you are correct in that statement, so that aspect is probably not worth spending too much time on.

---
Israel Brewster
Software Engineer
Alaska Volcano Observatory 
Geophysical Institute - UAF 
2156 Koyukuk Drive 
Fairbanks AK 99775-7320
Work: 907-474-5172
cell:  907-328-9145

On Jan 6, 2020, at 10:05 AM, Justin <zzzzz.graf@xxxxxxxxx> wrote:

There are several ways to actually do this

If you have Postgresql 11 or higher we now have Create Procedure  that allows committing transactions,  one draw back is it can not parallel from inside the procedure
https://www.postgresql.org/docs/11/sql-createprocedure.html
https://severalnines.com/database-blog/overview-new-stored-procedures-postgresql-11

If its an older version then Python Script or other scripting language to iterates over the data say 10 to 20K will do what you want

for i in list of IDs 
begin ;
"UPDATE table_name SET changed_field=new_value()  where ID @> int4range(i, i+10000);
commit;

To create parallel process simple Python script or other scripting language can be used to create many connections working the data in parallel  but given the simple update it will NOT help in performance,  this  will be Hard disk IO bound,  not process bound where parallelization  helps 

On Mon, Jan 6, 2020 at 1:36 PM Israel Brewster <ijbrewster@xxxxxxxxxx> wrote:
Thanks to a change in historical data, I have a need to update a large number of records (around 50 million). The update itself is straight forward, as I can just issue an "UPDATE table_name SET changed_field=new_value();" (yes, new_value is the result of a stored procedure, if that makes a difference) command via psql, and it should work. However, due to the large number of records this command will obviously take a while, and if anything goes wrong during the update (one bad value in row 45 million, lost connection, etc), all the work that has been done already will be lost due to the transactional nature of such commands (unless I am missing something).
Given that each row update is completely independent of any other row, I have the following questions:

1) Is there any way to set the command such that each row change is committed as it is calculated?
2) Is there some way to run this command in parallel in order to better utilize multiple processor cores, other than manually breaking the data into chunks and running a separate psql/update process for each chunk? Honestly, manual parallelizing wouldn’t be too bad (there are a number of logical segregations I can apply), I’m just wondering if there is a more automatic option.

---
Israel Brewster
Software Engineer
Alaska Volcano Observatory 
Geophysical Institute - UAF 
2156 Koyukuk Drive 
Fairbanks AK 99775-7320
Work: 907-474-5172
cell:  907-328-9145