Re: LIKE op with B-Tree Index?

"Sam Wong" <sam@xxxxxxxxxxxx> · Thu, 18 Oct 2012 13:58:40 +0800

> Moncure wrote on Thursday, October 18, 2012 1:45 
> On Tue, Oct 16, 2012 at 8:01 PM, Sam Wong <sam@xxxxxxxxxxxx> wrote:
> >> On Wednesday, October 17, 2012 4:30, Merlin Moncure wrote,
> >>
> >> On Tue, Oct 16, 2012 at 3:15 AM, Sam Wong <sam@xxxxxxxxxxxx> wrote:
> >> > Hi communities,
> >> >
> >> > I am investigating a performance issue involved with LIKE 'xxxx%'
> >> > on an index in a complex query with joins.
> >> >
> >> > The problem boils down into this simple scenario---:
> >> > ====Scenario====
> >> > My database locale is C, using UTF-8 encoding. I tested this on
> >> > 9.1.6
> > and 9.
> >> > 2.1.
> >> >
> >> > Q1.
> >> > SELECT * FROM shipments WHERE shipment_id LIKE '12345678%'
> >> >
> >> > Q2.
> >> > SELECT * FROM shipments WHERE shipment_id >= '12345678' AND
> >> > shipment_id < '12345679'
> >> >
> >> > ...snip...
> >> >
> >> > ====Question====
> >> > Is Q1 and Q2 equivalent? From what I see and the result they seems
> >> > to be the same, or did I miss something? (Charset: C, Encoding:
> >> > UTF-8) If they are equivalent, is that a bug of the planner?
> >>
> >> They are most certainly not equivalent.  What if the shipping_id is
> >> 12345678Z?
> >>
> >> merlin
> >>
> > But '12345678Z' is indeed >= '12345678' AND < '12345679'. Just like
'apple'
> > < 'apples' < 'apply' in a dictionary.
> 
> Right -- I didn't visualize it properly.  Still, you're asking the server
to infer that
> since you're looking between to adjacent textual characters range bounded
[) it
> convert the 'between' to a partial
> string search.   That hold up logically but probably isn't worth
> spending cycles to do, particularly in cases of non-ascii mappable unicode
> characters.
> merlin

Postgresql did that already. Refer to the analyze result of Q1 and Q2, it
gives
"Index Cond: ((shipment_id >= '12345678'::text) AND (shipment_id <
'12345679'::text))"
(I also just realized they did it just now)

Yet, with additional Filter (ref Q1 analyze), it's surprisingly that it
estimates Q1 will have more rows that Q2.

FYI, I made a self-contained test case and submitted a bug #7610.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance