Re: Suboptimal query plan when using expensive BCRYPT functions

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On Fri, Mar 21, 2014 at 5:59 PM, Erik van Zijst <erik.van.zijst@xxxxxxxxx> wrote:
Hi there,

I've got a relatively simple query that contains expensive BCRYPT
functions that gets optimized in a way that causes postgres to compute
more bcrypt hashes than necessary, thereby dramatically slowing things
down.

In a certain part of our application we need to lookup users by their
username, email address and password. Now we don't store plaintext
passwords and so the query needs to compute bcrypt hashes on the fly:

    SELECT DISTINCT u.*
    FROM auth_user u
    JOIN bb_userprofile p ON p.user_id = u.id
    JOIN bb_identity i ON i.profile_id = p.id
    WHERE
    (
      (
        u.username ILIKE 'detkin'
        OR
        i.email ILIKE 'foo@xxxxxxxxxxx'
      )
      AND
      (
        SUBSTRING(password FROM 8) = CRYPT(
          'detkin', SUBSTRING(password FROM 8))
      )
    )

These queries are generated by a parser that translates from an
external query language to SQL run on the database. This test db
contains 12 user records.

With a single bcrypt hash taking ~300ms to compute, this is a recipe
for disaster and so the app only allows queries that require only a
very small number of bcrypt computation.

E.g. the user must always "AND" the password lookup with a clause like
" username = 'foo' AND password = 'bar'" (username is unique).

However, while the query above technically only needs to compute 1
hash (there is a user 'detkin' and email 'foo@xxxxxxxxxxx' does not
exist), it instead creates a query plan that computes hashes *before*
filtering on username and email, leading to 12 hash computations and a
very slow query.

The EXPLAIN (ANALYZE, BUFFERS) is here: http://explain.depesz.com/s/yhE

The schemas for the 3 tables involved are here:
http://pgsql.privatepaste.com/f72020ad0a

As a quick experiment I tried moving the joins and email lookup into a
nested IN query, but that still generates a plan that computes hashes
for all 12 users, before picking out the 1 whose username matches.

Is there any way I can get postgres to perform the hash calculations
on the *result* of the other parts of the where clause, instead of the
other way around? Or else rewrite the query?

Cheers,
Erik

(untested), but how about something like the following:

WITH au AS (
    SELECT DISTINCT u.*
    FROM auth_user u
    JOIN bb_userprofile p ON p.user_id = u.id
    JOIN bb_identity i ON i.profile_id = p.id
    WHERE u.username ILIKE 'detkin'
    OR i.email ILIKE 'foo@xxxxxxxxxxx')

SELECT au.*
FROM au
WHERE SUBSTRING(au.password FROM 8) = CRYPT('detkin', SUBSTRING(au.password FROM 8));


[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux