make bulk deletes faster?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I have the following table:

CREATE TABLE timeblock
(
  timeblockid int8 NOT NULL,
  starttime timestamp,
  endtime timestamp,
  duration int4,
  blocktypeid int8,
  domain_id int8,
  create_date timestamp,
  revision_date timestamp,
  scheduleid int8,
  CONSTRAINT timeblock_pkey PRIMARY KEY (timeblockid),
  CONSTRAINT fk25629e03312570b FOREIGN KEY (blocktypeid)
      REFERENCES blocktype (blocktypeid) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE NO ACTION,
  CONSTRAINT fk25629e09be84177 FOREIGN KEY (domain_id)
      REFERENCES wa_common_domain (domain_id) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE NO ACTION
) 
WITH OIDS;

CREATE INDEX timeblock_blocktype_idx
  ON timeblock
  USING btree
  (blocktypeid);

CREATE INDEX timeblock_date_idx
  ON timeblock
  USING btree
  (starttime, endtime);

CREATE INDEX timeblockepoch_idx
  ON timeblock
  USING btree
  (date_trunc('minute'::text, starttime), (date_part('epoch'::text, 
date_trunc('minute'::text, starttime)) * 1000::double precision), 
date_trunc('minute'::text, endtime), (date_part('epoch'::text, 
date_trunc('minute'::text, endtime)) * 1000::double precision));

CREATE INDEX timeblockhourmin_idx
  ON timeblock
  USING btree
  (date_part('hour'::text, starttime), date_part('minute'::text, 
starttime), date_part('hour'::text, endtime), date_part('minute'::text, 
endtime));

CREATE INDEX timeblockid_idx
  ON timeblock
  USING btree
  (timeblockid);


There are also indexes on wa_common_domain and blocktype on pkeys.

explain analyze delete from timeblock where timeblockid = 666666

Index Scan using timeblockid_idx on timeblock  (cost=0.00..5.28 rows=1 
width=6) (actual time=0.022..0.022 rows=0 loops=1)
  Index Cond: (timeblockid = 666666)
Total runtime: 0.069 ms


I need to routinely move data from the timeblock table to an archive 
table with the same schema named timeblock_archive.  I really need this 
to happen as quickly as possible, as the archive operation appears to 
really tax the db server... 

I'd like some suggestions on how to get the deletes to happen faster, as 
while deleting individually appears to extremely fast, when I go to 
delete lots of rows the operation takes an extremely long time to 
complete (5000 rows takes about 3 minutes, 1000000 rows takes almost 
close to 4 hours or more depending upon server load; wall time btw).

i've tried several different approaches doing the delete and I can't 
seem to make it much faster... anyone have any ideas?

The approaches I've taken both use a temp table to define the set that 
needs to be deleted.

Here's what I've tried:

Attempt 1:
----------
delete from timeblock where timeblockid in (select timeblockid from 
timeblock_tmp)


Attempt 2:
----------
num_to_delete := (select count(1) from tmp_timeblock);
RAISE DEBUG 'archiveDailyData(%): need to delete from timeblock [% 
rows]', timestart, num_to_delete;
cur_offset := 0;
while cur_offset < num_to_delete loop
   delete from timeblock where timeblockid in 
      (select timeblockid from 
         tmp_timeblock limit 100 offset cur_offset);
   get diagnostics num_affected = ROW_COUNT;
   RAISE DEBUG 'archiveDailyData(%): delete from timeblock [% rows] 
cur_offset = %', timestart, num_affected, cur_offset;
      cur_offset := cur_offset + 100;
end loop;


Attempt 3:
----------
   num_to_delete := (select count(1) from tmp_timeblock);
   cur_offset := num_to_delete;
   RAISE DEBUG 'archiveDailyData(%): need to delete from timeblock [% 
rows]', timestart, num_to_delete;
   open del_cursor for select timeblockid from tmp_timeblock;
   loop
      fetch del_cursor into del_pkey;
      if not found then
         exit;
      else
         delete from timeblock where timeblockid = del_pkey;
         get diagnostics num_affected = ROW_COUNT;
         cur_offset := cur_offset - num_affected;
         if cur_offset % 1000 = 0 then 
            RAISE DEBUG 'archiveDailyData(%): delete from timeblock [% 
left]', timestart, cur_offset;
         end if;
      end if;
   end loop;
   close del_cursor;


I've considered using min(starttime) and max(starttime) from the temp 
table and doing: 

delete from timeblock where starttime between min and max;

however, I'm concerned about leaving orphan data, deleting too much data  
running into foreign key conflicts, etc.

dropping the indexes on timeblock could be bad, as this table recieves 
has a high volume on reads, inserts & updates.

Any one have any suggestions?

Thanks,

Jim K


[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux