Re: query time

Richard Huxton <dev@xxxxxxxxxxxx> · Wed, 02 Feb 2005 15:50:55 +0000

WireSpot wrote:
On Wed, 02 Feb 2005 14:48:41 +0000, Richard Huxton <dev@xxxxxxxxxxxx> wrote:

Think about it, you'd need an index that ordered use_name so that
(john_doe, AAAAAjohn_doe, Zjohn_doe1234) were all next to each other.

If you anchor the search (LIKE 'john_doe%') and are using the C locale
then an index can be used (because it's the same as >='john_doe' AND
<'john_dof').

Unfortunately, all my cases are LIKE '%john_doe'. So I'm guessing I'll
never get an index.

Well, if you want names *ending* in john_doe you can create a function 
reverse() and then create a functional index on it. If you want 
something in the middle, tough.

How about the response time when doing both LIKE user_name and ORDER
BY timestamp? Why does it get blown out of the water like that, from 5
and 5 to 2000 ms? If a LIKE by itself takes 5 ms and an ORDER by
itself takes 5 ms... Doesn't it grab the results matching the LIKE and
the ORDER only those?

In your first '%john_doe%' example you had a LIMIT 100 without a sort. 
So - the sequential scan started and when PG found the 100th match it 
stopped. If you order by timestamp, it has to find all the matches, sort 
them by timestamp and then discard the 101st onwards.

Now, the reason it doesn't use the timestamp index instead is that it 
thinks that your LIKE is going to be very specific:

  (cost=0.00..8250.17 rows=1 width=175)

In actual fact, there isn't one matching row there are 15,481:

  (actual time=0.432..1051.036 rows=15481 loops=1)

I'm not sure that there's any way for PG to know how many matches there 
are going to be - it keeps track of values, but there's no easy way to 
model substrings of a column.

Are you sure you need to match user_name against a double-wildcarded 
value? If you do, all I could suggest is perhaps limiting the search to 
one day/week/whatever at a time, which might make the timestamp index 
seem appealing.

While we're on the subject of indexes, is there any way I can speed up
a SELECT DISTINCT user_name FROM log_entries? With 200.000 entries I
get like 46 seconds on this one.

Not sure there's a simple way to avoid a seq-scan of the whole table. PG 
has good concurrency support with MVCC, but the downside is that the 
indexes don't store whether something is visible or not. That means you 
can't just look at the index to determine what values are visible to 
your current transaction.

> I apologise if these things are already in the manual, I'm only now
> getting used to it and I don't find some things right away.

No problem - all sensible questions. Worth checking the mailing-list 
archives too though.

--
  Richard Huxton
  Archonet Ltd

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

              http://archives.postgresql.org