Re: about multiprocessingmassdata

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 4.4.2012 17:52, superman0920 wrote:
> Hi list!
>
> i have a table which has 8500000 rows records. i write a java program to
> update these records.
> i use 100 threads to update the records. For example, thread-1 update
> 1~85000 records; thread-2 update 85001~170000 and so on.
> The update sql's aim is remove the space in the column and it is simple:
> update poi set py=replace(py,' ','') where id=?;

That's a very naive approach. It's very likely each thread will do an
index scan for each update (to evaluate the 'id=?' condition. And that's
going to cost you much more than you gain because index scans are quite
CPU and I/O intensive.

Simply update the whole table by

   UPDATE poi SET py = replace(py, ' ','');

Have you actually tried how this performs or did you guess 'it's
definitely going to be very slow so I'll use multiple threads to make
that faster'?

If you really need to parallelize this, you need to do that differently
- e.g. use 'ctid' to skip to update a whole page like this:

   UPDATE poi SET py = replace(py, ' ','')
    WHERE ctid >= '(n,0)'::tid AND ctid < '(n+1,0)'::tid AND;

where 'n' ranges between 0 and number of pages the table (e.g. in pg_class).

But try the simple UPDATE first, my guess is it's going to be much
faster than you expect.

Tomas

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux