Re: Can the query planner create indexes?

bricklen <bricklen@xxxxxxxxx> · Tue, 21 Dec 2010 08:29:27 -0800

On Tue, Dec 21, 2010 at 7:34 AM, Jeremy Harris <jgh@xxxxxxxxxxx> wrote:
> On 2010-12-21 14:26, tv@xxxxxxxx wrote:
>>>
>>> Why not auto-create indices for some limited period after database load
>>> (copy? Âany large number of inserts from a single connection?), track
>>> those
>>> that actually get re-used and remove the rest? Â Would this not provide
>>> a better out-of-the-box experience for neophytes?
>>
>> Say you have a table with several columns (A,B,C), and the query is using
>> some of them. What indexes would you create? One index on every column? A
>> multi-column index on all columns? Indexs for each combination of columns?
>
> One single-column index, on the first index-worthy column appearing.
> Keep it simple. Â ÂMaybe, while you're doing that full-table-scan. gather
> stats on all the indexable columns for later reference, to guide choice of
> which column to index later.
>
>> There really is no automatic way to solve this puzzle using a single
>> query. Indexing strategy is a very tough design discipline, and it
>> requires a complex knowledge of the workload. One slow query does not mean
>> the index should be created - what if that was just an ad-hoc query and
>> will not be executed ever again?
>
> Then the index you just built gets automatically dropped, as I said above.
>
>> I really don't want to work with products that try to be smarter than me
>> (and I admit I'm pretty dumb from time to time) and act rather randomly
>> due to this 'artificial intelligence'. I've already been there and I don't
>> want to repeat this experience.
>
> Then, since you're not a neophyte, leave the feature turned off. Â But don't
> deny the possibility of using it to actual neophytes.
>
>
>>>> as in b): Creating an index is quite expensiv
>>>
>>> How much more so than doing that full-table-scan plus sort, which your
>>> query is doing anyway?
>>
>> A lot. It consumes a lot of CPU, it may consume a lot of memory (up to
>> maintenance_work_mem) etc. So imagine a few users, building indices on a
>> big table simultaneously
>
> Pffthht. Â One simple trylock, used only by the auto-indexer.
>
>>
>> Building an index is just one side of the problem - maintenance of the
>> indexes is another thing. Each index has an impact on write operations
>> (INSERT/UPDATE) and may cause that HOT actually does not work (thus
>> causing unnecessary bloat of the index).
>
> This is worth discussing. Â ÂHow much help does the DBMS currently give
> the DBA in evaluating these tradeoffs? Â ÂCould we do better, given an
> actual installation and workload?
>
>> And the indexes may need lot of
>> space on a disk.
>
> By all means require limits as well as a "don't do that" switch.
>
>>
>> But the real show stopper is probably locking. Building an index takes a
>> write lock on a table, effectively blocking writes. Sure, you can use a
>> 'CREATE INDEX ... CONCURRENTLY' introduced in 8.2, but there are some
>> disadvantages of that (see
>>
>> http://www.postgresql.org/docs/9.0/static/sql-createindex.html#SQL-CREATEINDEX-CONCURRENTLY).
>
> The only disadvantages I see there are a) the leftover "invalid" index -
> which feels like a
> bug; why is it not auto-dropped? Â and b) the second scan, which implies
> more total work
> and a wish to background that portion after completing the query triggering
> the auto-index.
>
> Don't forget I suggested doing this only for a limited time after DB
> creation. ÂPossibly another
> reason to turn it off should be any manual index creation, as that implies
> that the DBA
> knows about indexing.
>
> I see in another thread you suggest merely placing hints in the log.
> That's a fine
> first step - but I'll then be wanting to auto-parse that log to
> auto-create....
>
> Cheers,
> Â ÂJeremy
>

This thread offers up some interesting possibilities. Expanding on
what has already been discussed, maybe a contrib module for query
tuning/index suggestions?
Some things that came to mind immediately that the module could do
(feasible or not):
- Look at the EXPLAIN ANALYZE
- Examine the index access methods and table scans + costs/rows
- Which indexes were used?
- What were the blocks/tuples hit & read?
- Look at join conditions and WHERE clause filters
- Data types in the joins (mismatched?)
- Churn rate of the tables, eg. the updates/deletes/inserts. This
might allow suggestion of other index types (eg. gist)

Tool then provides feedback on possibly helpful indexes to test, and
why (hypothetical indexes could be applied here). Possibly provided
suggestions on ways to improve the query, eg. data types don't match
in the join, EXISTS vs IN, etc

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general