On 10/19/06, Jens Schipkowski <jens.schipkowski@xxxxxxxxxx> wrote:
// select finds out which one has not an twin // a twin is defined as record with the same attr* values // decreases speed over time until timeout by postgresql SELECT * FROM tbl_reg reg WHERE register <> loc1 AND idreg NOT IN ( SELECT reg.idreg FROM tbl_reg reg, tbl_reg regtwin WHERE regtwin.register = 1 AND regtwin.type <> 20 AND reg.attr1 = regtwin.attr1 AND reg.attr2 = regtwin.attr2 AND reg.attr3 = regtwin.attr3 AND reg.attr4 = regtwin.attr4 AND reg.attr5 = regtwin.attr5 AND reg.attr6 = regtwin.attr6 AND reg.idreg <> regtwin.idreg AND reg.register = 2 );
[...]
We have the problem, that we cannot see any potential to improve SQL statements. Indexing the attr* columns seems not to be an solution, because the data mustn't be unique (twins) and changes really often so reindexing will took too long.
1. your database design is the real culprit here. If you want things to run really quickly, solve the problem there by normalizing your schema. denomalization is the root cause of many, many, problems posted here on this list. 2. barring that, the above query will run fastest by creating multi-column indexes on regtwin (attr*) fields. and reg(attr*). the real solution to problems like this is often proper idnexing, especially multi column. saying indexes take to long to build is like saying: 'i have a problem, so i am going to replace it with a much worse problem'. 3. try where exists/not exists instead of where in/not in merlin