On Wed, 02 Feb 2005 14:48:41 +0000, Richard Huxton <dev@xxxxxxxxxxxx> wrote:
Think about it, you'd need an index that ordered use_name so that (john_doe, AAAAAjohn_doe, Zjohn_doe1234) were all next to each other.
If you anchor the search (LIKE 'john_doe%') and are using the C locale then an index can be used (because it's the same as >='john_doe' AND <'john_dof').
Unfortunately, all my cases are LIKE '%john_doe'. So I'm guessing I'll never get an index.
Well, if you want names *ending* in john_doe you can create a function reverse() and then create a functional index on it. If you want something in the middle, tough.
How about the response time when doing both LIKE user_name and ORDER BY timestamp? Why does it get blown out of the water like that, from 5 and 5 to 2000 ms? If a LIKE by itself takes 5 ms and an ORDER by itself takes 5 ms... Doesn't it grab the results matching the LIKE and the ORDER only those?
In your first '%john_doe%' example you had a LIMIT 100 without a sort. So - the sequential scan started and when PG found the 100th match it stopped. If you order by timestamp, it has to find all the matches, sort them by timestamp and then discard the 101st onwards.
Now, the reason it doesn't use the timestamp index instead is that it thinks that your LIKE is going to be very specific:
(cost=0.00..8250.17 rows=1 width=175)
In actual fact, there isn't one matching row there are 15,481:
(actual time=0.432..1051.036 rows=15481 loops=1)
I'm not sure that there's any way for PG to know how many matches there are going to be - it keeps track of values, but there's no easy way to model substrings of a column.
Are you sure you need to match user_name against a double-wildcarded value? If you do, all I could suggest is perhaps limiting the search to one day/week/whatever at a time, which might make the timestamp index seem appealing.
While we're on the subject of indexes, is there any way I can speed up a SELECT DISTINCT user_name FROM log_entries? With 200.000 entries I get like 46 seconds on this one.
Not sure there's a simple way to avoid a seq-scan of the whole table. PG has good concurrency support with MVCC, but the downside is that the indexes don't store whether something is visible or not. That means you can't just look at the index to determine what values are visible to your current transaction.
> I apologise if these things are already in the manual, I'm only now > getting used to it and I don't find some things right away.
No problem - all sensible questions. Worth checking the mailing-list archives too though.
-- Richard Huxton Archonet Ltd
---------------------------(end of broadcast)--------------------------- TIP 6: Have you searched our list archives?
http://archives.postgresql.org