Hello, I’ve written an sql function that takes a single JSON parameter, in this case an array of objects each with eight properties (example below.) This function uses json_to_recordset() in a CTE to insert three rows on two tables. It takes nearly 7 minutes to insert my dataset of 8935 records (in JSON), using a small python script (the script parses the JSON in < 1s) and PostgreSQL 13 on my Macbook Air. As an experiment I wrote a second sql function with 8 parameters representing each property of the JSON objects, requiring a discrete function call per record. I wrote a python script to insert all 8935 records (making as many calls to the sql function) which executed in around 2 minutes. I’m very much a novice at interpreting EXPLAIN (ANALYZE) and hoping someone can help me better optimize my original function. Both the function and results of explain/analyze are provided below. Is it perhaps a limitation of CTEs or json_to_recordset(), and an entirely different approach is necessary (like the second function I wrote with one call per record?) I was really hoping to make this work, I’ve written a small API to my database using sql and plpgsql functions each taking a single JSON parameter, and my web backend acts almost as an http proxy to the database. It’s a different way of doing things as far as webdev goes, but (for me) an interesting experiment. I like PostgreSQL and really want to take advantage of it. I’ve tried a few “optimizations”. Removing the final SELECT and returning VOID saves around 1.5 min. Removing some extra JOINs saves a little time but nothing substantial (the joins against account.t_account in some places are to check “ownership” of the record by a given client_id.) Using a subquery seems like it could be much faster than a CTE, but I don’t know how to insert three rows on two tables using a subquery. Any advice is appreciated, thank you in advance. Matt INPUT: (The CTE illustrates all the properties and types) [ { “bank_account_id”: 1324, “transaction_id”: “abc123”, “transaction_date”: “2020-10-20”, … }, … ] OUTPUT: (Not sure what I’ve done to create the nested arrays, but it’s unnecessary.. I can shave off ~1.5min by returning VOID.) [ [ { "id": 250185 }, { "id": 250186 }, ... ] ] FUNCTION: CREATE OR REPLACE FUNCTION journal.create_with_categories(in_json JSON) RETURNS JSON AS $$ WITH data AS ( SELECT (in_json#>>'{context, client_id}')::BIGINT AS client_id, nextval(pg_get_serial_sequence('journal', 'id')) AS journal_id, bank_transaction_type, x.* FROM json_to_recordset(in_json->'data') AS x ( bank_account_id BIGINT, transaction_id TEXT, transaction_date DATE, posted_date DATE, amount finance.monetary, description TEXT, parent_account_id BIGINT, child_account_id BIGINT ), LATERAL bank_account.get_transaction_type_by_id(x.bank_account_id, x.amount) AS bank_transaction_type ), insert_journal_entry AS ( INSERT INTO journal.journal ( client_id, id, bank_account_id, transaction_id, transaction_date, posted_date, description ) SELECT client_id, journal_id, bank_account_id, transaction_id, transaction_date, posted_date, description FROM data ), insert_account_entries AS ( INSERT INTO journal.account2journal( account_id, journal_id, amount, type ) -- T account SELECT t.id, d.journal_id, @ d.amount, CASE WHEN d.bank_transaction_type = 'debit'::transaction_type THEN 'credit'::transaction_type ELSE 'debit'::transaction_type END FROM data d LEFT JOIN account.t_account t ON (t.id = COALESCE(d.child_account_id, d.parent_account_id)) WHERE t.client_id = d.client_id OR t.id IS NULL UNION ALL -- bank account SELECT t.id, d.journal_id, @ d.amount, d.bank_transaction_type FROM data d JOIN bank_account.bank_account b ON (b.id = d.bank_account_id) JOIN account.t_account t ON (t.id = b.t_account_id) WHERE t.client_id = d.client_id ) SELECT json_agg(d) FROM (SELECT d.journal_id AS id FROM data AS d) AS d; $$ LANGUAGE sql; EXPLAIN ANALYZE: (From logs) Aggregate (cost=24.24..24.25 rows=1 width=32) (actual time=388926.249..388926.371 rows=1 loops=1) Buffers: shared hit=53877 dirtied=2 CTE data -> Nested Loop (cost=0.26..4.76 rows=100 width=148) (actual time=183.906..388716.550 rows=8935 loops=1) Buffers: shared hit=53877 dirtied=2 -> Function Scan on json_to_recordset x (cost=0.01..1.00 rows=100 width=128) (actual time=130.645..142.316 rows=8935 loops=1) -> Function Scan on get_transaction_type_by_id bank_transaction_type (cost=0.25..0.26 rows=1 width=4) (actual time=0.154..0.156 rows=1 loops=8935) Buffers: shared hit=18054 CTE insert_journal_entry -> Insert on journal (cost=0.00..2.00 rows=100 width=96) (actual time=453.563..453.563 rows=0 loops=1) Buffers: shared hit=79242 dirtied=295 -> CTE Scan on data (cost=0.00..2.00 rows=100 width=96) (actual time=0.006..10.001 rows=8935 loops=1) CTE insert_account_entries -> Insert on account2journal (cost=4.86..15.23 rows=2 width=52) (actual time=816.381..816.381 rows=0 loops=1) Buffers: shared hit=159273 dirtied=335 written=17 -> Result (cost=4.86..15.23 rows=2 width=52) (actual time=0.206..109.222 rows=17870 loops=1) Buffers: shared hit=5 -> Append (cost=4.86..15.20 rows=2 width=52) (actual time=0.197..95.060 rows=17870 loops=1) Buffers: shared hit=5 -> Hash Left Join (cost=4.86..7.14 rows=1 width=52) (actual time=0.195..35.512 rows=8935 loops=1) Hash Cond: (COALESCE(d_1.child_account_id, d_1.parent_account_id) = t.id) Filter: ((t.client_id = d_1.client_id) OR (t.id IS NULL)) Buffers: shared hit=2 -> CTE Scan on data d_1 (cost=0.00..2.00 rows=100 width=68) (actual time=0.004..6.544 rows=8935 loops=1) -> Hash (cost=3.27..3.27 rows=127 width=16) (actual time=0.137..0.137 rows=127 loops=1) Buffers: shared hit=2 -> Seq Scan on t_account t (cost=0.00..3.27 rows=127 width=16) (actual time=0.026..0.073 rows=127 loops=1) Buffers: shared hit=2 -> Hash Join (cost=3.80..8.03 rows=1 width=52) (actual time=40.182..53.796 rows=8935 loops=1) Hash Cond: ((t_1.id = b.t_account_id) AND (t_1.client_id = d_2.client_id)) Buffers: shared hit=3 -> Seq Scan on t_account t_1 (cost=0.00..3.27 rows=127 width=16) (actual time=0.022..0.079 rows=127 loops=1) Buffers: shared hit=2 -> Hash (cost=3.59..3.59 rows=14 width=60) (actual time=40.118..40.118 rows=8935 loops=1) Buffers: shared hit=1 -> Hash Join (cost=1.32..3.59 rows=14 width=60) (actual time=0.071..17.863 rows=8935 loops=1) Hash Cond: (d_2.bank_account_id = b.id) Buffers: shared hit=1 -> CTE Scan on data d_2 (cost=0.00..2.00 rows=100 width=60) (actual time=0.005..3.740 rows=8935 loops=1) -> Hash (cost=1.14..1.14 rows=14 width=16) (actual time=0.030..0.030 rows=14 loops=1) Buffers: shared hit=1 -> Seq Scan on bank_account b (cost=0.00..1.14 rows=14 width=16) (actual time=0.012..0.016 rows=14 loops=1) Buffers: shared hit=1 -> CTE Scan on data d (cost=0.00..2.00 rows=100 width=8) (actual time=183.918..388812.950 rows=8935 loops=1) Buffers: shared hit=53877 dirtied=2 Trigger for constraint journal_client_id_fkey on journal: time=194.194 calls=8935 Trigger for constraint journal_bank_account_id_fkey on journal: time=204.014 calls=8935 Trigger trigger_journal_import_sequence on journal: time=373.344 calls=1 Trigger for constraint account2journal_account_id_fkey on account2journal: time=580.482 calls=17870 Trigger trigger_debits_equal_credits_on_insert on account2journal: time=116.653 calls=1