Re: Understanding "seq scans"

Merlin Moncure <mmoncure@xxxxxxxxx> · Wed, 14 Oct 2015 11:31:11 -0500

On Tue, Oct 13, 2015 at 11:54 AM, Alvaro Herrera
<alvherre@xxxxxxxxxxxxxxx> wrote:
> Lele Gaifax wrote:
>> Alvaro Herrera <alvherre@xxxxxxxxxxxxxxx> writes:
>>
>> > So 10% of your rows in the master_l10n table start with "quattro"?
>> > That's pretty odd, isn't it?  How did you manufacture these data?
>>
>> Well, not a real scenario for sure, but definitely not odd: I just needed an
>> "extremely" big dataset to test out several different strategies, both on
>> table layout and indexes. The tables are populated by mechanically translating
>> the integer primary key into the corresponding "in words" string (1 -> "one")...
>
> I imagined it would be something like that.  It's not the most useful
> set of test data, precisely because it doesn't accurately reflect what
> you're going to have in practice.  I suggest you enter some actual text,
> even if it's just text from Don Camillo or whatever.
>
>> > How often are you going to look for translated text without specifying a
>> > language?
>>
>> Never. The most frequently used criteria is «LIKE '%word%'» in the context of
>> a user session, and thus with a "preferred language".
>
> Be very careful with a % at the left.  The index is not going to work at
> all there.  It is not the same as looking for stuff without a % at the
> left.

Note, you can optimize LIKE '%foo%' with pg_trgm/gin indexing.
Trigram based indexing is kind of a mixed bag but is about to get a
lot faster with recent enhancements so that it should mostly match or
beat the brute force search.  This is the preferred solution if you
need to do partial string matching -- for most other cases of
attribute searching I'd be looking at jsonb.  Welcome to postgres OP!

merlin

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general