Re: [PERFORM] Query > 1000× slowdown after adding datetime comparison

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Aug 31, 2015 at 12:09 PM, twoflower <standa.kurik@xxxxxxxxx> wrote:
I have the following three tables:

DOCUMENT
  id (index)
  documenttype
  date_last_updated: timestamp(6) (indexed)

EXTERNAL_TRANSLATION_UNIT
  id (indexed)
  fk_id_document (indexed)

EXTERNAL_TRANSLATION
  id (indexed)
  fk_id_translation_unit (indexed)

Table sizes:
 DOCUMENT: 381 000
 EXTERNAL_TRANSLATION_UNIT: 76 000 000
 EXTERNAL_TRANSLATION: 76 000 000

Now the following query takes about 36 minutes to finish:

 SELECT u.id AS id_external_translation_unit,
    r.id AS id_external_translation,
    u.fk_id_language AS fk_id_source_language,
    r.fk_id_language AS fk_id_target_language,
    doc.fk_id_job
 FROM "EXTERNAL_TRANSLATION_UNIT" u
     JOIN "DOCUMENT" doc ON u.fk_id_document = doc.id
     JOIN "EXTERNAL_TRANSLATION" r ON u.id = r.fk_id_translation_unit
 WHERE doc.date_last_updated >= date(now() - '171:00:00'::interval)
 ORDER BY r.id LIMIT 1000

This is the query plan:

<http://postgresql.nabble.com/file/n5864045/qp1.png>

If I remove the WHERE condition, it returns immediately.


​So does  "SELECT 1;" - but since that doesn't give the same answer it is not very relevant.​
 
Am I doing something obviously wrong?

​Not obviously...​


Thank you for any ideas.

​Consider updating the translation tables at the same time the document table is updated.  That way you can apply the WHERE and ORDER BY clauses against the same table.

​I presume you've run ANALYZE on the data.

I would probably try something like:

WITH docs AS ( SELECT ... WHERE date > ...)
SELECT ... FROM (translations join translation_unit) t
WHERE EXISTS (SELECT 1 FROM docs WHERE t.doc_id = docs.doc_id)
ORDER BY t.id LIMIT 1000

You are trying to avoid the NESTED LOOP and the above has a decent chance of materializing docs and then building either a bit or hash map for both docs and translations thus performing a single sequential scan over both instead of performing 70+ million index lookups.

Take this with a grain of salt as my fluency in this area is limited - I tend to work with trial-and-error but without data that is difficult.

I'm not sure if the planner could be smarter because you are asking a question it is not particularly suited to estimating - namely cross-table correlations.  Rethinking the model is likely to give you a better outcome long-term though it does seem like there should be room for improvement within the stated query and model.

As Tomas said you likely will benefit from increased working memory in order to make materializing and hashing/bitmapping favorable compared to a nested loop.

David J.


[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux