On Thu, 19 Oct 2006 19:32:22 +0200, Merlin Moncure <mmoncure@xxxxxxxxx>
wrote:
On 10/19/06, Jens Schipkowski <jens.schipkowski@xxxxxxxxxx> wrote:
// select finds out which one has not an twin
// a twin is defined as record with the same attr* values
// decreases speed over time until timeout by postgresql
SELECT *
FROM tbl_reg reg
WHERE register <> loc1 AND
idreg NOT IN
(
SELECT reg.idreg
FROM tbl_reg reg, tbl_reg regtwin
WHERE regtwin.register = 1 AND
regtwin.type <> 20 AND
reg.attr1 = regtwin.attr1 AND
reg.attr2 = regtwin.attr2 AND
reg.attr3 = regtwin.attr3 AND
reg.attr4 = regtwin.attr4 AND
reg.attr5 = regtwin.attr5 AND
reg.attr6 = regtwin.attr6 AND
reg.idreg <> regtwin.idreg AND
reg.register = 2
);
[...]
We have the problem, that we cannot see any potential to improve SQL
statements. Indexing the attr* columns seems not to be an solution,
because the data mustn't be unique (twins) and changes really often so
reindexing will took too long.
1. your database design is the real culprit here. If you want things
to run really quickly, solve the problem there by normalizing your
schema. denomalization is the root cause of many, many, problems
posted here on this list.
Believe it is normalized. We also seperated configuration and runtime
data. And this is a runtime table.
This table holds short living data for devices to be registered by a
registration server. The INSERTs are triggered by external devices. The
master data tables are perfectly normalized too. What you are seeing is
not the real column names. I changed it due to readability. attr* have
really different names and meanings. A "twin" (in real, initiator/member
of the same conferencing group) is defined by these attributes. Due to
high flexibility of this system (serverside configuration/ deviceside
configuration for runtime) there is no other way to normalize.
2. barring that, the above query will run fastest by creating
multi-column indexes on regtwin (attr*) fields. and reg(attr*). the
real solution to problems like this is often proper idnexing,
especially multi column. saying indexes take to long to build is like
saying: 'i have a problem, so i am going to replace it with a much
worse problem'.
I will index it. Just prepared the test and will run it tomorrow.
3. try where exists/not exists instead of where in/not in
Did try it, before I switched to NOT IN. It was 10 times slower.
merlin