Re: PostgreSQL db, 30 tables with number of rows < 100 (not huge) - the fastest way to clean each non-empty table and reset unique identifier column of empty ones.

Merlin Moncure <mmoncure@xxxxxxxxx> · Wed, 18 Jul 2012 09:33:42 -0500

On Tue, Jul 3, 2012 at 10:22 AM, Stanislaw Pankevich
<s.pankevich@xxxxxxxxx> wrote:
> Hello,
>
> My question below is almost exact copy of the on on SO:
> http://stackoverflow.com/questions/11311079/postgresql-db-30-tables-with-number-of-rows-100-not-huge-the-fastest-way
>
> The post on SO caused a few answers, all as one stating "DO ONLY TRUNCATION
> - this is the fast".
>
> Also I think I've met some amount of misunderstanding of what exactly do I
> want. I would appreciate it great, if you try, as people whom I may trust in
> performance question.
>
> Here goes the SO subject, formulating exact task I want to accomplish, this
> procedure is intended to be run beetween after or before each test, ensure
> database is cleaned enough and has reset unique identifiers column (User.id
> of the first User should be nor the number left from previous test in a test
> suite but 1). Here goes the message:
>
> ==== PostgreSQL db, 30 tables with number of rows < 100 (not huge) - the
> fastest way to clean each non-empty table and reset unique identifier column
> of empty ones ====
>
> I wonder, what is the fastest way to accomplish this kind of task in
> PostgreSQL. I am interested in the fastest solutions ever possible.
>
> I found myself such kind of solution for MySQL, it performs much faster than
> just truncation of tables one by one. But anyway, I am interested in the
> fastest solutions for MySQL too. See my result here, of course it it for
> MySQL only: https://github.com/bmabey/database_cleaner/issues/126
>
> I have following assumptions:
>
>     I have 30-100 tables. Let them be 30.
>
>     Half of the tables are empty.
>
>     Each non-empty table has, say, no more than 100 rows. By this I mean,
> tables are NOT large.
>
>     I need an optional possibility to exclude 2 or 5 or N tables from this
> procedure.
>
>     I cannot! use transactions.
>
> I need the fastest cleaning strategy for such case working on PostgreSQL
> both 8 and 9.
>
> I see the following approaches:
>
> 1) Truncate each table. It is too slow, I think, especially for empty
> tables.
>
> 2) Check each table for emptiness by more faster method, and then if it is
> empty reset its unique identifier column (analog of AUTO_INCREMENT in MySQL)
> to initial state (1), i.e to restore its last_value from sequence (the same
> AUTO_INCREMENT analog) back to 1, otherwise run truncate on it.
>
> I use Ruby code to iterate through all tables, calling code below on each of
> them, I tried to setup SQL code running against each table like:
>
> DO $$DECLARE r record;
> BEGIN
>   somehow_captured = SELECT last_value from #{table}_id_seq
>   IF (somehow_captured == 1) THEN
>     == restore initial unique identifier column value here ==
>   END
>
>   IF (somehow_captured > 1) THEN
>     TRUNCATE TABLE #{table};
>   END IF;
> END$$;

This didn't work because you can't use variables for table names in
non-dynamic (that is, executed as a string) statements. You'd probably
want:

EXECUTE 'TRUNCATE TABLE ' || #{table};

As to performance, TRUNCATE in postgres (just like mysql) has the nice
property that the speed of truncation is mostly not dependent on table
size: truncating a table with 100 records is not very much faster than
truncating a table with millions of records.  For very small tables,
it might be faster to simply fire off a delete.

merlin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance