Search Postgresql Archives

Re: bytea encode performance issues

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Aug 3, 2008, at 2:36 PM, Sim Zacks wrote:

The LIKE operator is likely the problem, but it is a critical part of an email application. Searches are done by, "Show me all emails containing
the following word."

I've tried using TSearch2's full text index. It made the query 50%
faster, taking 5 minutes. This is still not even close to the less then
a minute in MySQL.


That LIKE operator is probably your problem. An unbounded LIKE like that
(with a wildcard on both sides) means no index can be used, hence you
get a sequential scan.

There are apparently some possibilities with the new GIN indexes (or
maybe even using GIST), but I haven't had an opportunity to try those
yet. There were some messages about just that on this list recently.


I don't think a functional index (or anything other then a FTI) would
accomplish anything, being that I am doing unbounded Likes.

That's why I suggested to use a text field instead of bytea.
IIRC, You can have an index on word triplets and use tsearch. I don't have intimate knowledge on how that works though, hopefully other people will chime in here.

Without the need to convert each row before comparing it, and with an appropriate index, that should significantly speed up your queries.

If you create an index, make sure you create a _functional_ index over
ENCODE(messageblk, 'escape').


Email is binary when it contains attachments. I actually planned on
using an ASCII encoding, but the dbmail people specifically said not to.
I don't know if they were speaking from experience, or because ASCII
sounds bad.

It shouldn't be; those attachments are MIME or UU encoded, are they not?
Don't confuse ASCII and SQLASCII. The latter accepts characters from any encoding, which is probably what you want.

Since when is e-mail binary data? I don't quite see why you'd use a
bytea field instead of text. If your problem is character encoding, then
just don't store that ("encode" the DB using SQLASCII).


As I mentioned, this is the system that came with dbmail. It runs on
both PostGresql and MySQL, so they may have done some compatibility
things. There are 4 statuses possible, 0,1,2,3 if you use the database
through the software then a 99 could never appear there.

The software isn't the only client that might connect to the database.
It is usually bad practice to put data constraint logic in the client instead of in the database. Especially since in client code there are usually multiple sections of code that have to deal with those constraints, which tends to result in small differences in their handling.

Next to that, if those statuses would have a proper foreign key constraint, it would be very easy to add labels to each status in a way they would make a bit more sense than 0, 1, 2, 3. I expect the label would be a sufficient foreign key by itself though, no need for those silly numbers.

Well, there's probably not much you can do about that, being just a user of dbMail.

I notice some other oddities in that query/design. Why is is_header a
smallint instead of a boolean? I'm assuming this is for compatibility
with an other database?

Why use status IN (0, 1) instead of more descriptive keys? Is it even
constrained this way, or could arbitrary numbers end up as status (say
99) and if so, what happens to those messages?

You have a very odd way of quoting. You don't top post as such, but you top post in context... Haven't seen that one before. Usually people reply _below_ a section, not above it ;) It made my reply a bit harder to write.

Alban Hertroys

--
If you can't see the forest for the trees,
cut the trees and you'll see there is no forest.


!DSPAM:737,4895b34b243488085013917!




[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux