Search Postgresql Archives

Re: Natural key woe

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Oliver

I've read your email, with interest. I haven't had to deal with this sort of problem in PostgreSQL, but I have frequently dealt with it in a Sybase environment, first encountered about 25 years ago.

I am most curious to know why you didn't use the same sequence for both tables, I must be missing something. If there is a gotcha, I'd like to know about it as I can see this being an issue in a load sharing environment. Many years ago, before auto sequencing was common, we set up explicit functions to generate sequwnce numbers. Whilst this had some perormance costs in multi-user systems, it did have benefits in terms of making it easier to restrict the need for row locking to the underlying data table.

Robin St.Clair
On 13/05/2014 10:44, Oliver Kohll - Mailing Lists wrote:
I'm sure no one else on this list has done anything like this, but here's a cautionary tale.

I wanted to synchronise data in two tables (issue lists) - i.e. whenever a record is added into one, add a similar record into the other. The two tables are similar in format but not exactly the same so only a subset of fields are copied. Both tables have synthetic primary keys, these can't be used to match data as they are auto-incrementing sequences that might interfere. What I could have done perhaps is get both tables to use the same sequence, but what I actually did is:

* join both tables based on a natural key
* use that to copy any missing items from table1 to table2
* truncate table1 and copy all of table2's rows to table1
* run this routine once an hour

The natural key was based on the creation timestamp (stored on insert) and the one of the text fields, called 'subject'.

The problem came when someone entered a record with no subject, but left it null. When this was copied over and present in both tables, the *next* time the join was done, a duplicate was created because the join didn't see them as matching (null != null).

So after 1 hour there were two records. After two there were four, after 3, 8 etc.

When I logged in after 25 hrs and noticed table access was a little slow, there were 2^25 = 33 million records.

That's a learning experience for me at least. It's lucky I did check it at the end of that day rather than leaving it overnight, otherwise I think our server would have ground to a halt.

One other wrinkle to note. After clearing out these rows, running 'VACUUM table2', 'ANALYZE table2' and 'REINDEX table table2', some queries with simple sequence scans were taking a few seconds to run even though there are only a thousand rows in the table. I finally found that running CLUSTER on the table sorted that out, even though we're on an SSD so I would have thought seeking all over the place for a seq. scan wouldn't have made that much difference. It obviously does still make some.

Oliver Kohll
www.agilebase.co.uk










[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux