Search Postgresql Archives

Re: “Loop” in plpgsql Function - PostgreSQL 9.2

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



hello,


On Tue, 2016-01-19 at 10:37 +1300, drum.lucas@xxxxxxxxx wrote:
> I've created a function that allows me to do an huge update.
> 
> But I need to limit this function. I need to do 50k rows (example) and
> then stop it. After that I need to continue from the rows that I've
> stopped... I'll have to modify the call function select
> batch_number() as well.
> 
> How can I do that? Using for?
> 
>         The function below is already working, but, th
>         e table has 40m rows. And it's taking years.
>         
> 
> FUNCTION:
> 
> CREATE or REPLACE FUNCTION batch_number()
> RETURNS INTEGER AS $$
> 
> declare
>    batch_num integer;
>    offset_num integer;
> begin
>     offset_num = 0;
>     batch_num = 1;
> 
>     while (select true from gorfs.nfs_data where batch_number is null limit 1) loop
>         with ids(id) as
>         (
>             select
>                 file_id
>             from
>                 gorfs.nfs_data
>             order by
>                 file_id
>             offset offset_num
>             limit 1000
>         )
>         update
>             gorfs.nfs_data
>         set
>             batch_number = batch_num
>         from ids
>         where file_id = ids.id;
> 
>         offset_num = offset_num + 1000;
>         batch_num = batch_num + 1;
>     end loop;
> 
>     return batch_num;
> end
> 
> $$ language 'plpgsql';

do i read correctly you want to do something like that:

WITH b(n,i) AS
  (SELECT row_number() OVER (ORDER BY file_id), file_id
     FROM gorfs.nfs_data)
UPDATE gorfs.nfs_data SET batch_number = ((b.n-1)/1000)+1
  FROM b WHERE file_id = b.i

apply math to match your desired result, this one based on your fun.

which should not scan that many times for batch_number being null. i
don't know implementation details and i do not know if you have any
partial index where batch_number is null which could be used to speed
this up.

of cource this is for one run to change all at once.
but if you really want multiple tx on this how about that:

CREATE INDEX ON gorfs.nfs_data (file_id) WHERE batch_number IS NULL;

find next batchnum to use eg max(batch_number)+1 from gorfs.nfs_data,
or some sequence, locking or whatever if concurency migth byte you.
let it be YourNextBN

WITH b(i) AS
  (SELECT file_id
     FROM gorfs.nfs_data
     WHERE batch_number IS NULL ORDER BY file_id LIMIT 1000
     FOR UPDATE)
UPDATE gorfs.nfs_data SET batch_number = YourNextBN
  FROM b WHERE batch_number IS NULL AND file_id = b.i

now, wouldn't it be little simpler then your loop with multiple scanning
same data and throwing it out via offset?

on the other hand, if you think creating simple small index on file_id
where batch_number is null is not an option, then this may not give you
real speedup

of cource your real life example might be more complicated than one
column update, but looking at your fun, it scared me, i mean, your fun
updates every one row in table anyway, so why so many times with
additional scan to check loop condition? it gets worse if there is no
partial index where batch_number is null. 

regards, mariusz




-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux