Re: Is there an md5sum for tables?

Steve Atkins <steve@xxxxxxxxxxx> · Wed, 2 Apr 2008 09:43:49 -0700

On Apr 1, 2008, at 8:56 PM, Michael Enke wrote:
Hi all,
I need to know if multiple tables have (may have most probably)  
identical content.
Since I want a fast solution (which means not comparing tables row  
by row),
I thought it would be a good idea to have an sql function operating  
on a table or view
similar to md5sum on a file and only compare the generated hashes.
I did not find such a function in the documentation.

Is such a function already available?

No, but you could probably build it fairly easily from parts that
are already there.

If not, does it make no sense to use such an approach?

Possibly, but generating the hash might be time consuming,
as you'd need to iterate through the whole table, probably
in a specific order, to generate the hash. So every time you
modified the table you'd have to do something like
"select * from table order by <something unique>" to
maintain a current hash. (If you used a weak hash, something
like XOR of the md5 of each row, then you could do it cheaply
incrementally - and that might even be a usable hash if there's
a unique constraint on some field).

Overall, though, this is a very non-database thing to want to
do. Iit sounds like you may have made some bad
decisions mapping your data into the database. Usually
you'd abstract out common data and refer to it via a foreign
key, or somesuch.

What's the actual application?

Cheers,
  Steve

--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general