Postgres Gurus, Thank you all so much for your continued hard work with Postgres—the DB is amazing and I continue to look forward to each new release! One feature I’ve grown to love in the last year or two is the ltree column type. Unfortunately, I am starting to run into significant DB contention when performing basic and complex queries involving the ltree path and its associated GiST index. Here’s my setup: * PostgreSQL version 9.6.1 running on AWS RDS * 1 x master (db.r3.large - 2 vCPUs, 15GB RAM) * 1 x read replica (db.r3.large - 2 vCPUs, 15GB RAM) * extensions: * ltree v1.1 * btree-gist v1.2 I have a table called “organizations” with about 500,000 rows in it. The relevant portion of the schema looks like this: Table "public.organizations" Column | Type | Modifiers -----------------+--------------------------+-------------------------------- id | character varying(36) | not null path | ltree | Indexes: "organizations_pkey" PRIMARY KEY, btree (id) "path_gist_idx" gist (path) "path_idx" btree (path) The id column is a UUID and the path column is an ltree containing id-like hierarchical values. Example data is: id | path --------------------------------------+--------------------------------------------------------------------------------------------------------- dba3511f-32ef-486f-ade2-7540ae28922e | root.dba3511f32ef486fade27540ae28922e 0209a983-88fa-47df-8328-6d9f39d60951 | root.dba3511f32ef486fade27540ae28922e.0209a98388fa47df83286d9f39d60951 05a49dba-a823-42e3-9f4b-f9ec9cdffcde | root.dba3511f32ef486fade27540ae28922e.05a49dbaa82342e39f4bf9ec9cdffcde 07166591-aba2-4e91-a00b-e1491edaa9b3 | root.dba3511f32ef486fade27540ae28922e.07166591aba24e91a00be1491edaa9b3 0777d32b-33f9-4131-a552-7e8b8a1355bb | root.dba3511f32ef486fade27540ae28922e.0777d32b33f94131a5527e8b8a1355bb 07ad8c30-76ad-4ea9-99c5-8e934ce45b03 | root.dba3511f32ef486fade27540ae28922e.07ad8c3076ad4ea999c58e934ce45b03 09566d1a-4924-4311-8687-d4389c130e76 | root.dba3511f32ef486fade27540ae28922e.09566d1a492443118687d4389c130e76 09cca793-af47-4f79-938f-72e1f37a5580 | root.dba3511f32ef486fade27540ae28922e.09cca793af474f79938f72e1f37a5580 0edc1ba7-830a-4da9-8f69-d3eb0611946f | root.dba3511f32ef486fade27540ae28922e.0edc1ba7830a4da98f69d3eb0611946f 10b20349-7da4-41d8-a780-e09b750b9236 | root.dba3511f32ef486fade27540ae28922e.10b203497da441d8a780e09b750b9236 2d55bfbb-4785-4368-a7e8-45afef6ae753 | root.dba3511f32ef486fade27540ae28922e.10b203497da441d8a780e09b750b9236.2d55bfbb47854368a7e845afef6ae753 …… (196 rows) I’m trying to track down issues causing queries to take 10-200ms and, I think, causing contention within the database which exacerbates the all query times. Issue #1: the query planner costs seem to be inaccurate, despite running ANALYZE on the organizations table and REINDEX on the GiST index: db=> EXPLAIN ANALYZE SELECT path FROM organizations WHERE path ~ 'root.dba3511f32ef486fade27540ae28922e.*{1,}'; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------- Bitmap Heap Scan on organizations (cost=72.25..1880.85 rows=495 width=134) (actual time=0.273..0.547 rows=195 loops=1) Recheck Cond: (path ~ 'root.dba3511f32ef486fade27540ae28922e.*{1,}'::lquery) Heap Blocks: exact=175 -> Bitmap Index Scan on path_gist_idx (cost=0.00..72.12 rows=495 width=0) (actual time=0.252..0.252 rows=195 loops=1) Index Cond: (path ~ 'root.dba3511f32ef486fade27540ae28922e.*{1,}'::lquery) Planning time: 0.061 ms Execution time: 0.603 ms (7 rows) The previous sample data excerpt shows 196 rows out of almost 500,000 are a child of the ltree path being searched. The query planner shows a cost estimate of: cost=72.25..1880.85 rows=495 and an actual time/rows of: actual time=0.273..0.547 rows=195 If I modify the query slightly to avoid the ‘~’ operator with the ltree GiST index, it improves the cost estimate and, in a production environment with lots of contention, the query takes about half the time to execute: db=> EXPLAIN ANALYZE SELECT path FROM organizations WHERE path <@ 'root.dba3511f32ef486fade27540ae28922e' AND path != 'root.dba3511f32ef486fade27540ae28922e'; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------- Bitmap Heap Scan on organizations (cost=8.80..204.18 rows=50 width=134) (actual time=0.298..0.606 rows=195 loops=1) Recheck Cond: (path <@ 'root.dba3511f32ef486fade27540ae28922e'::ltree) Filter: (path <> 'root.dba3511f32ef486fade27540ae28922e'::ltree) Rows Removed by Filter: 1 Heap Blocks: exact=176 -> Bitmap Index Scan on path_gist_idx (cost=0.00..8.79 rows=50 width=0) (actual time=0.222..0.222 rows=196 loops=1) Index Cond: (path <@ 'root.dba3511f32ef486fade27540ae28922e'::ltree) Planning time: 0.115 ms Execution time: 0.665 ms (9 rows) Both queries hit the GiST index and yet still have wildly inaccurate cost estimates. Can anyone provide advice on how to further track down the contention issues related to these queries? Or further optimizations to make on the ltree field? I think this is all related to the inaccurate cost estimates but I’m not sure. Issue #2: When using the same query with a join, combined with high CPU utilization due to contention, the an entire query takes orders of magnitude longer (32ms) than it’s individual components (index only scan of 0.161ms and bitmap heap scan of 0.804ms). db=> EXPLAIN ANALYZE SELECT o.path FROM organizations o, hostname h WHERE o.id = h.organization_id AND path ~ 'root.dba3511f32ef486fade27540ae28922e.*{1,}'; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------- Nested Loop (cost=72.66..3303.89 rows=41 width=134) (actual time=32.530..32.530 rows=0 loops=1) -> Bitmap Heap Scan on organizations o (cost=72.25..1880.85 rows=495 width=171) (actual time=0.400..0.894 rows=195 loops=1) Recheck Cond: (path ~ 'root.dba3511f32ef486fade27540ae28922e.*{1,}'::lquery) Heap Blocks: exact=175 -> Bitmap Index Scan on path_gist_idx (cost=0.00..72.12 rows=495 width=0) (actual time=0.374..0.374 rows=195 loops=1) Index Cond: (path ~ 'root.dba3511f32ef486fade27540ae28922e.*{1,}'::lquery) -> Index Only Scan using hostname_organization_id_idx on hostname h (cost=0.41..2.86 rows=1 width=37) (actual time=0.161..0.161 rows=0 loops=195) Index Cond: (organization_id = (o.id)::text) Heap Fetches: 0 Planning time: 0.521 ms Execution time: 32.577 ms (11 rows) Does anyone have any ideas on helping to track down issues with contention? I’m not seeing any locks, so I’m thinking it’s related to these queries taking longer and stacking up on top of each other. The above JOIN query is taking on average 30ms to complete at a rate of about 10 queries/second. Thanks! PC -------- |
Attachment:
smime.p7s
Description: S/MIME cryptographic signature