Re: Can the query planner create indexes?

tv@xxxxxxxx · Tue, 21 Dec 2010 15:26:29 +0100

> On 2010-12-21 10:42, Massa, Harald Armin wrote:
>> b) creating an index requires to read the data-to-be-indexed. So, to
>> have an
>> index pointing at the interesting rows for your query, the table has to
>> be
>> read ... which would be the perfect time to allready select the
>> interesting
>> rows. And after having the interesting rows: the index is worthless
>
> ... until another similar query comes along, when suddenly it's a massive
> win.
> Why not auto-create indices for some limited period after database load
> (copy?  any large number of inserts from a single connection?), track
> those
> that actually get re-used and remove the rest?   Would this not provide
> a better out-of-the-box experience for neophytes?

Say you have a table with several columns (A,B,C), and the query is using
some of them. What indexes would you create? One index on every column? A
multi-column index on all columns? Indexs for each combination of columns?

There really is no automatic way to solve this puzzle using a single
query. Indexing strategy is a very tough design discipline, and it
requires a complex knowledge of the workload. One slow query does not mean
the index should be created - what if that was just an ad-hoc query and
will not be executed ever again?

So although this 'automatic index creation' seems nice, it really does not
work in practice.

I really don't want to work with products that try to be smarter than me
(and I admit I'm pretty dumb from time to time) and act rather randomly
due to this 'artificial intelligence'. I've already been there and I don't
want to repeat this experience.

>
> [...]
>> Why is the query planner not allowed to create indexes, but only allowed
>> to
>>> use or not use what's available?
>>>
>>
>> as in b): Creating an index is quite expensiv
>
> How much more so than doing that full-table-scan plus sort, which your
> query is doing anyway?

A lot. It consumes a lot of CPU, it may consume a lot of memory (up to
maintenance_work_mem) etc. So imagine a few users, building indices on a
big table simultaneously - no, I really don't want to see this on a
production server.

Building an index is just one side of the problem - maintenance of the
indexes is another thing. Each index has an impact on write operations
(INSERT/UPDATE) and may cause that HOT actually does not work (thus
causing unnecessary bloat of the index). And the indexes may need lot of
space on a disk.

But the real show stopper is probably locking. Building an index takes a
write lock on a table, effectively blocking writes. Sure, you can use a
'CREATE INDEX ... CONCURRENTLY' introduced in 8.2, but there are some
disadvantages of that (see
http://www.postgresql.org/docs/9.0/static/sql-createindex.html#SQL-CREATEINDEX-CONCURRENTLY).

cheers
Tomas

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general