stored proc and inserting hundreds of thousands of rows

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I have a stored proc that potentially inserts hundreds of thousands, potentially millions, of rows (below).

This stored proc is part of the the sequence of creating an ad campaign and links an ad to documents it should be displayed with.

A few of these stored procs can run concurrently as users create ad campaigns.

We have 2 million documents now and linking an ad to all of them takes 5 minutes on my top-of-the-line SSD MacBook Pro.

Last but not least, the system has to quickly serve ads while documents are being linked which is a problem at the moment.

What can I do to make linking documents to ads faster or have less impact on the system. I would like the system to be as responsive with serving ads while the linking itself is allowed to take a few minutes. 

One thing I'm concerned with, for example, is the whole multi-million row insert running within the stored proc transaction. I think inserting rows one by one or in small batches may be an improvement. I don't know how to accomplish this, though.

	Thanks, Joel

---

CREATE DOMAIN doc_id AS varchar(64);
CREATE DOMAIN id AS int;

CREATE TABLE doc_ads
(
  doc_id    id NOT NULL REFERENCES docs,
  ad_id     id NOT NULL REFERENCES ads,
  distance  float NOT NULL
);

CREATE INDEX doc_ads_idx ON doc_ads(doc_id);

CREATE OR REPLACE FUNCTION link_doc_to_ads(doc id, threshold float) 
RETURNS void AS $$
BEGIN
  INSERT INTO doc_ads (doc_id, ad_id, distance)
  SELECT doc, (t).ad_id, (t).distance
  FROM (SELECT ads_within_distance(topics, threshold) AS t
       FROM docs
       WHERE id = doc) AS x;
  ANALYZE doc_ads;
END;
$$ LANGUAGE plpgsql;

--------------------------------------------------------------------------
- for hire: mac osx device driver ninja, kernel extensions and usb drivers
---------------------+------------+---------------------------------------
http://wagerlabs.com | @wagerlabs | http://www.linkedin.com/in/joelreymont
---------------------+------------+---------------------------------------




-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance



[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux