PostgreSQL db, 30 tables with number of rows < 100 (not huge) - the fastest way to clean each non-empty table and reset unique identifier column of empty ones.

Stanislaw Pankevich <s.pankevich@xxxxxxxxx> · Tue, 3 Jul 2012 18:22:43 +0300

Hello, 

My question below is almost exact copy of the on on SO: http://stackoverflow.com/questions/11311079/postgresql-db-30-tables-with-number-of-rows-100-not-huge-the-fastest-way

The post on SO caused a few answers, all as one stating "DO ONLY TRUNCATION - this is the fast".

Also I think I've met some amount of misunderstanding of what exactly do I want. I would appreciate it great, if you try, as people whom I may trust in performance question.

Here goes the SO subject, formulating exact task I want to accomplish, this procedure is intended to be run beetween after or before each test, ensure database is cleaned enough and has reset unique identifiers column (User.id of the first User should be nor the number left from previous test in a test suite but 1). Here goes the message:

==== PostgreSQL db, 30 tables with number of rows < 100 (not huge) - the fastest way to clean each non-empty table and reset unique identifier column of empty _ones_ ====

I wonder, what is the fastest way to accomplish this kind of task in PostgreSQL. I am interested in the fastest solutions ever possible.

I found myself such kind of solution for MySQL, it performs much faster than just truncation of tables one by one. But anyway, I am interested in the fastest solutions for MySQL too. See my result here, of course it it for MySQL only: https://github.com/bmabey/database_cleaner/issues/126

I have following assumptions:

    I have 30-100 tables. Let them be 30.

    Half of the tables are empty.

    Each non-empty table has, say, no more than 100 rows. By this I mean, tables are NOT large.

    I need an optional possibility to exclude 2 or 5 or N tables from this procedure.

    I cannot! use transactions.

I need the fastest cleaning strategy for such case working on PostgreSQL both 8 and 9.

I see the following approaches:

1) Truncate each table. It is too slow, I think, especially for empty tables.

2) Check each table for emptiness by more faster method, and then if it is empty reset its unique identifier column (analog of AUTO_INCREMENT in MySQL) to initial state (1), i.e to restore its last_value from sequence (the same AUTO_INCREMENT analog) back to 1, otherwise run truncate on it. 

I use Ruby code to iterate through all tables, calling code below on each of them, I tried to setup SQL code running against each table like:

DO $$DECLARE r record;
BEGIN 
  somehow_captured = SELECT last_value from #{table}_id_seq

  IF (somehow_captured == 1) THEN
    == restore initial unique identifier column value here ==
  END

  IF (somehow_captured > 1) THEN
    TRUNCATE TABLE #{table};
  END IF;
END$$;

Manipulating this code in various aspects, I couldn't make it work, because of I am unfamiliar with PostgreSQL functions and blocks (and variables).

Also my guess was that EXISTS(SELECT something FROM TABLE) could somehow be used to work good as one of the "check procedure" units, cleaning procedure should consist of, but haven't accomplished it too.

I would appreciate any hints on how this procedure could be accomplished in PostgreSQL native way.

Thanks!

UPDATE:

I need all this to run unit and integration tests for Ruby or Ruby on Rails projects. Each test should have a clean DB before it runs, or to do a cleanup after itself (so called teardown). Transactions are very good, but they become unusable when running tests against particular webdrivers, in my case the switch to truncation strategy is needed. Once I updated that with reference to RoR, please do not post here the answers about "Obviously, you need DatabaseCleaner for PG" and so on and so on.

==== post ends ====

Thanks,

Stanislaw.