Re: LIKE op with B-Tree Index?

Merlin Moncure <mmoncure@xxxxxxxxx> · Wed, 17 Oct 2012 12:45:25 -0500

On Tue, Oct 16, 2012 at 8:01 PM, Sam Wong <sam@xxxxxxxxxxxx> wrote:
>> On Wednesday, October 17, 2012 4:30, Merlin Moncure wrote,
>>
>> On Tue, Oct 16, 2012 at 3:15 AM, Sam Wong <sam@xxxxxxxxxxxx> wrote:
>> > Hi communities,
>> >
>> > I am investigating a performance issue involved with LIKE 'xxxx%' on
>> > an index in a complex query with joins.
>> >
>> > The problem boils down into this simple scenario---:
>> > ====Scenario====
>> > My database locale is C, using UTF-8 encoding. I tested this on 9.1.6
> and 9.
>> > 2.1.
>> >
>> > Q1.
>> > SELECT * FROM shipments WHERE shipment_id LIKE '12345678%'
>> >
>> > Q2.
>> > SELECT * FROM shipments WHERE shipment_id >= '12345678' AND
>> > shipment_id < '12345679'
>> >
>> > ...snip...
>> >
>> > ====Question====
>> > Is Q1 and Q2 equivalent? From what I see and the result they seems to
>> > be the same, or did I miss something? (Charset: C, Encoding: UTF-8) If
>> > they are equivalent, is that a bug of the planner?
>>
>> They are most certainly not equivalent.  What if the shipping_id is
>> 12345678Z?
>>
>> merlin
>>
> But '12345678Z' is indeed >= '12345678' AND < '12345679'. Just like 'apple'
> < 'apples' < 'apply' in a dictionary.

Right -- I didn't visualize it properly.  Still, you're asking the
server to infer that since you're looking between to adjacent textual
characters range bounded [) it convert the 'between' to a partial
string search.   That hold up logically but probably isn't worth
spending cycles to do, particularly in cases of non-ascii mappable
unicode characters.

merlin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance