On Tue, Apr 14, 2015 at 3:29 PM, Jim Nasby <Jim.Nasby@xxxxxxxxxxxxxx> wrote:
On 4/14/15 4:44 PM, Pawel Veselov wrote:
On Tue, Apr 14, 2015 at 1:15 PM, Jim Nasby <Jim.Nasby@xxxxxxxxxxxxxx<mailto:Jim.Nasby@bluetreble.com>> wrote:
On 4/14/15 1:28 PM, Pawel Veselov wrote:
I wonder if what I need to do, considering that I update a lot
of "the
same" rows as I process this queue, is to create a temp table,
update
the rows there, and then update the actual tables once at the end...
That's what I'd do.
Well, in short, I changed (repeat the body of loop for how many tables
are there)
LOOP (item)
UPDATE table with item
IF not found INSERT item INTO table; END IF;
END LOOP;
to:
CREATE TEMP TABLE xq_table (like table) on commit drop;
LOOP (item)
LOOP
UPDATE xq_table with item;
exit when found;
INSERT INTO xq_table select * from table for update;
continue when found;
INSERT item INTO xq_table;
exit;
END LOOP;
END LOOP;
UPDATE table a set (rows) = (xq.rows)
FROM xq_table xq
WHERE (a.keys) = (xq.keys)
That works significantly faster. The final update statement is very
fast. The process is somewhat slow in the beginning as it sucks in
records from "total" into "xq_total", but once all of that is moved into
the temp table, it rushes through the rest.
Databases like to think in sets. It will generally be more efficient to do set operations instead of a bunch of row-by-row stuff.
Since you're pulling all of this from some other table your best bet is probably something like:
CREATE TEMP TABLE raw AS DELETE FROM queue WHERE ... RETURNING *;
CREATE TEMP VIEW hourly_v AS SELECT ... FROM raw GROUP BY;
UPDATE ar_hourly SET ... FROM hourly_v JOIN ...;
INSERT INTO ar_hourly SELECT FROM hourly_v LEFT JOIN ar_hourly ON ...;
-- Same thing for daily
-- Same thing for total
In my previous post, there was a problem with that pseudo-code, as it's missing inserts into the final table at the end of loop, for those records that need to be inserted and not updated.
This is where using sets becomes really tedious, as Postgres severely lacks an upsert-like statement.
I don't think there are joins allowed in UPDATE statement, so I will need to use WITH query, right?
Also, I'm not sure how LEFT JOIN will help me isolate and insert "missed" entries...
Would it be OK to replace upsert part with merging into a temp table, then deleting and inserting from temp table? Is there any penalty for insert/delete comparing to update?
[skipped]
But remember that if you update or delete a row, removing it from an
index, the data will stay in that index until vacuum comes along.
Also, there's no point in doing a REINDEX after a VACUUM FULL;
vacuum full rebuilds all the indexes for you.
I was being desperate :)
I still think there is something very wrong with this particular table.
First, I have production systems that employ this function on way larger
data set, and there is no problem (so far, but still). This machine is
part of a test deployment, there is no constant load, the only data that
is being written now is when I do these tests. Vacuuming should prune
all that dead stuff, and if it's absent, it's unclear where is the time
spent navigating/updating the table with 24 rows :)
I think you definitely have a problem with dead rows, as evidenced by the huge improvement VACUUM FULL made.
But it's not clear why (and not reasonable, IMHO, that) it wouldn't improve past current point.