Well, IN != EXISTS != JOIN. Exists just stops as soon as it finds a
record. For some cases, it's equivalent to IN, but not all. IN has to
de-duplicate it's list in some fashion. For small IN lists, you can do
this with an OR, but at some point you need to switch to an actual
unique (actually, I suspect the difference in PostgreSQL just depends on
if you passed values into IN or a subquery). A join on the other hand
doesn't worry about duplicates at all. There may be some brains in the
planner that realize if a subquery will return a unique set (ie: you're
querying on a primary key).
I agree, and it makes sense now that I consider it that IN would force the planner to implement some
form of unique check - possibly leveraging a PK or unique index if one is already available. Maybe
I'll tack up a note to the online documentation letting people know so that it's a little more
explicitly clear that when you choose IN on data that isn't explicitly unique (to the planner i.e.
post-analyze) you get the baggage of a forced unique whether you need it or not. Or perhaps someone
that knows the internals of the planner a little better than me should put some info up regarding that?
Just one more thing... I have found that maintaining a btree index on a
varchar(255) value is extremely expensive on insert/update/delete. It is
unfortunately necessary for me to maintain this index for queries and
reports so I am transitioning to using an unindexed staging table to
import data into before merging it with the larger table. All the docs
and posts recommend is to drop the index, import your data, and then
create the index again. This is untenable on a daily / bi-weekly basis.
Is there a more elegant solution to this indexing problem?
You might be happier with tsearch than a regular index.
Thanks, I'll look into using tsearch2 as a possibility. From what I've seen so far it would add
quite a bit of complexity (necessary updates after inserts, proprietary query syntax that might
require a large amount of specialization from client apps) but in the end the overhead may be less
than that of maintaining the btree.
Thanks and Regards,
B