Re: How to insert a bulk of data with unique-violations very fast

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Jun 3, 2010 at 11:19 AM, Torsten Zühlsdorff
<foo@xxxxxxxxxxxxxxxxxxx> wrote:
> Scott Marlowe schrieb:
>>
>> On Tue, Jun 1, 2010 at 9:03 AM, Torsten Zühlsdorff
>> <foo@xxxxxxxxxxxxxxxxxxx> wrote:
>>>
>>> Hello,
>>>
>>> i have a set of unique data which about 150.000.000 rows. Regullary i get
>>> a
>>> list of data, which contains multiple times of rows than the already
>>> stored
>>> one. Often around 2.000.000.000 rows. Within this rows are many
>>> duplicates
>>> and often the set of already stored data.
>>> I want to store just every entry, which is not within the already stored
>>> one. Also i do not want to store duplicates. Example:
>>
>> The standard method in pgsql is to load the data into a temp table
>> then insert where not exists in old table.
>
> Sorry, i didn't get it. I've googled some examples, but no one match at my
> case. Every example i found was a single insert which should be done or
> ignored, if the row is already stored.
>
> But in my case i have a bulk of rows with duplicates. Either your tipp
> doesn't match my case or i didn't unterstand it correctly. Can you provide a
> simple example?

create table main (id int primary key, info text);
create table loader (id int, info text);
insert into main values (1,'abc'),(2,'def'),(3,'ghi');
insert into loader values (1,'abc'),(4,'xyz');
select * from main;
 id | info
----+------
  1 | abc
  2 | def
  3 | ghi
(3 rows)

select * from loader;
 id | info
----+------
  1 | abc
  4 | xyz
(2 rows)

insert into main select * from loader except select * from main;
select * from main;
 id | info
----+------
  1 | abc
  2 | def
  3 | ghi
  4 | xyz
(4 rows)

Note that for the where not exists to work the fields would need to be
all the same, or you'd need a more complex query.  If the info field
here was different you'd get an error an no insert / update.  For that
case you might want to use "where not in":

insert into main select * from loader where id not in (select id from main);

If you wanted the new rows to update pre-existing rows, then you could
run an update first where the ids matched, then the insert where no id
matches.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance



[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux