Search Postgresql Archives

Re: bytea encode performance issues

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Aug 3, 2008, at 12:12 PM, Sim Zacks wrote:

This is the query that is used (I know it is not as efficient as it
could be, but this is the query it comes with):

SELECT m.message_idnr,k.messageblk FROM dbmail_messageblks k JOIN
dbmail_physmessage p ON k.physmessage_id = p.id JOIN dbmail_messages m
ON p.id = m.physmessage_id WHERE mailbox_idnr = 8 AND status IN (0,1 )
AND k.is_header = '0' GROUP BY m.message_idnr,k.messageblk HAVING
ENCODE(k.messageblk::bytea,'escape') LIKE '%John%'

That LIKE operator is probably your problem. An unbounded LIKE like that (with a wildcard on both sides) means no index can be used, hence you get a sequential scan.

There are apparently some possibilities with the new GIN indexes (or maybe even using GIST), but I haven't had an opportunity to try those yet. There were some messages about just that on this list recently.

If you create an index, make sure you create a _functional_ index over ENCODE(messageblk, 'escape').

The messageblk field is a bytea in postgresql and a longblob in mysql.
The only difference in the query is the MySQL does not need the encode
function.

Since when is e-mail binary data? I don't quite see why you'd use a bytea field instead of text. If your problem is character encoding, then just don't store that ("encode" the DB using SQLASCII).

Can someone make a suggestion for tuning the database?

An explain analyze would have been a bit more useful, but the biggest issue is probably the seqscan.

The explain of the query is:
"HashAggregate  (cost=43648.11..43648.85 rows=74 width=753)"
"  ->  Nested Loop  (cost=42999.83..43647.74 rows=74 width=753)"
"        ->  Merge Join  (cost=42999.83..43043.35 rows=74 width=769)"
"              Merge Cond: (k.physmessage_id = m.physmessage_id)"
" -> Sort (cost=39264.12..39267.59 rows=1388 width=753)"
"                    Sort Key: k.physmessage_id"
"                    ->  Seq Scan on dbmail_messageblks k
(cost=0.00..39191.68 rows=1388 width=753)"
"                          Filter: ((is_header = 0::smallint) AND
(encode(messageblk, 'escape'::text) ~~ '%John%'::text))"

Here is your problem, a sequential scan over a presumably large table. It's either caused by the LIKE expression or by the lack of a functional index on messageblk, or both.

If you change the type of the messageblk field to text you won't need a functional index anymore (although that only saves time on index creation and inserts/updates).

"              ->  Sort  (cost=3735.71..3754.59 rows=7552 width=16)"
"                    Sort Key: m.physmessage_id"
"                    ->  Bitmap Heap Scan on dbmail_messages m
(cost=385.98..3249.26 rows=7552 width=16)"
" Recheck Cond: ((mailbox_idnr = 8) AND (status
= ANY ('{0,1}'::integer[])))"
"                          ->  Bitmap Index Scan on dbmail_messages_8
(cost=0.00..384.10 rows=7552 width=0)"
"                                Index Cond: ((mailbox_idnr = 8) AND
(status = ANY ('{0,1}'::integer[])))"
"        ->  Index Scan using dbmail_physmessage_pkey on
dbmail_physmessage p  (cost=0.00..8.15 rows=1 width=8)"
"              Index Cond: (k.physmessage_id = p.id)"


I notice some other oddities in that query/design. Why is is_header a smallint instead of a boolean? I'm assuming this is for compatibility with an other database?

Why use status IN (0, 1) instead of more descriptive keys? Is it even constrained this way, or could arbitrary numbers end up as status (say 99) and if so, what happens to those messages?

Alban Hertroys

--
If you can't see the forest for the trees,
cut the trees and you'll see there is no forest.


!DSPAM:737,48958f34243483105918576!




[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux