Trying to reply to Rob:
Apologies if this does not end up in the thread (gmail is just driving me mad, I can't seem to receive messages, so I've subscribed again)
For some reason Limit 1 cause my query to go on for minutes without a response, which was not the case.
The following query takes about 10.6 seconds, with a table that has about 20 million rows. First, the indices of the table with create statement:
CREATE TABLE public.path_value (
val_string TEXT,
feature_mapping_id INTEGER NOT NULL,
parent_feature_mapping_id INTEGER,
feature_name TEXT,
rm_type_name TEXT,
path INTEGER NOT NULL,
payload_id INTEGER NOT NULL,
id INTEGER NOT NULL,
ehr_id INTEGER,
CONSTRAINT path_value_pkey PRIMARY KEY(id)
) WITHOUT OIDS;
CREATE INDEX path_value_idx ON public.path_value
USING btree (rm_type_name COLLATE pg_catalog."default", feature_name COLLATE pg_catalog."default");
CREATE INDEX path_value_idx1 ON public.path_value
USING btree (feature_mapping_id, parent_feature_mapping_id);
CREATE INDEX path_value_idx2 ON public.path_value
USING btree (payload_id);
Now results of EXPLAIN ANALYZE:
"QUERY PLAN"
"Nested Loop Semi Join (cost=543422.34..571013.45 rows=5 width=4) (actual time=5854.267..10684.798 rows=82003 loops=1)"
" Join Filter: (root.feature_mapping_id = anodeid.parent_feature_mapping_id)"
" -> Merge Join (cost=543422.34..543426.29 rows=12 width=24) (actual time=5854.182..5953.531 rows=82003 loops=1)"
" Merge Cond: ((root.feature_mapping_id = node1.parent_feature_mapping_id) AND (root.payload_id = node2.payload_id))"
" -> Sort (cost=1150.65..1151.35 rows=283 width=12) (actual time=359.940..380.707 rows=82003 loops=1)"
" Sort Key: root.feature_mapping_id, root.payload_id"
" Sort Method: external merge Disk: 1768kB"
" -> Index Scan using path_value_idx on path_value root (cost=0.00..1139.12 rows=283 width=12) (actual time=0.073..261.240 rows=82003 loops=1)"
" Index Cond: ((rm_type_name = 'COMPOSITION'::text) AND (feature_name = 'composition'::text))"
" -> Sort (cost=542271.69..542272.31 rows=247 width=12) (actual time=5494.236..5516.788 rows=82003 loops=1)"
" Sort Key: node1.parent_feature_mapping_id, node2.payload_id"
" Sort Method: external sort Disk: 2088kB"
" -> HashAggregate (cost=542259.40..542261.87 rows=247 width=12) (actual time=5377.761..5394.453 rows=82003 loops=1)"
" -> Hash Semi Join (cost=506297.37..542253.38 rows=1205 width=12) (actual time=2788.758..5325.838 rows=164004 loops=1)"
" Hash Cond: ((node1.payload_id = node2.payload_id) AND (node1.feature_mapping_id = node2.parent_feature_mapping_id))"
" -> Bitmap Heap Scan on path_value node1 (cost=77.61..9421.62 rows=2464 width=12) (actual time=76.628..2430.473 rows=246005 loops=1)"
" Recheck Cond: ((rm_type_name = 'CONTENTITEM'::text) AND (feature_name = 'content'::text))"
" -> Bitmap Index Scan on path_value_idx (cost=0.00..77.00 rows=2464 width=0) (actual time=75.640..75.640 rows=246005 loops=1)"
" Index Cond: ((rm_type_name = 'CONTENTITEM'::text) AND (feature_name = 'content'::text))"
" -> Hash (cost=498674.38..498674.38 rows=399092 width=8) (actual time=2711.653..2711.653 rows=410015 loops=1)"
" Buckets: 4096 Batches: 16 Memory Usage: 1013kB"
" -> Bitmap Heap Scan on path_value node2 (cost=10345.32..498674.38 rows=399092 width=8) (actual time=71.718..2607.467 rows=410015 loops=1)"
" Recheck Cond: (rm_type_name = 'ITEMSTRUCTURE'::text)"
" -> Bitmap Index Scan on path_value_idx (cost=0.00..10245.55 rows=399092 width=0) (actual time=69.634..69.634 rows=410015 loops=1)"
" Index Cond: (rm_type_name = 'ITEMSTRUCTURE'::text)"
" -> Index Scan using path_value_idx2 on path_value anodeid (cost=0.00..2298.91 rows=1 width=8) (actual time=0.057..0.057 rows=1 loops=82003)"
" Index Cond: (payload_id = node2.payload_id)"
" Filter: ((feature_name = 'archetypeNodeId'::text) AND (val_string = 'openEHR-EHR-COMPOSITION.discharge.v1'::text))"
"Total runtime: 10692.277 ms"
and finaly, the query that gave this result:
EXPLAIN ANALYZE SELECT root.id from path_value as root
WHERE
root.rm_type_name = 'COMPOSITION'
AND
root.feature_name = 'composition'
AND
EXISTS (SELECT anodeid.id from path_value as anodeid
WHERE
anodeId.parent_feature_mapping_id = root.feature_mapping_id
AND
anodeId.payload_id = root.payload_id
AND
anodeId.feature_name = 'archetypeNodeId'
AND
anodeId.val_string = 'openEHR-EHR-COMPOSITION.discharge.v1'
)
AND
EXISTS (SELECT 1 from path_value as node1
WHERE
node1.payload_id = root.payload_id
AND
node1.parent_feature_mapping_id = root.feature_mapping_id
AND
node1.feature_name = 'content'
AND
node1.rm_type_name = 'CONTENTITEM'
AND
EXISTS (SELECT 1 from path_value as node2
WHERE
node2.payload_id = node1.payload_id
AND
node2.parent_feature_mapping_id = node1.feature_mapping_id
AND
node2.rm_type_name = 'ITEMSTRUCTURE'
)
)
Is there a glaring error in my approach? Should I be better off with another SQL query, or Ltree/XPATH queries?
Best regards
Seref