Re: Help with slow table update

Jim Nasby <Jim.Nasby@xxxxxxxxxxxxxx> · Tue, 14 Apr 2015 17:29:26 -0500

On 4/14/15 4:44 PM, Pawel Veselov wrote:
On Tue, Apr 14, 2015 at 1:15 PM, Jim Nasby <Jim.Nasby@xxxxxxxxxxxxxx
<mailto:Jim.Nasby@xxxxxxxxxxxxxx>> wrote:

    On 4/14/15 1:28 PM, Pawel Veselov wrote:

        I wonder if what I need to do, considering that I update a lot
        of "the
        same" rows as I process this queue, is to create a temp table,
        update
        the rows there, and then update the actual tables once at the end...

    That's what I'd do.

Well, in short, I changed (repeat the body of loop for how many tables
are there)

LOOP (item)
   UPDATE table with item
   IF not found INSERT item INTO table; END IF;
END LOOP;

to:

CREATE TEMP TABLE xq_table (like table) on commit drop;
LOOP (item)
   LOOP
     UPDATE xq_table with item;
     exit when found;
     INSERT INTO xq_table select * from table for update;
     continue when found;
     INSERT item INTO xq_table;
     exit;
   END LOOP;
END LOOP;
UPDATE table a set (rows) = (xq.rows)
   FROM xq_table xq
   WHERE (a.keys) = (xq.keys)

That works significantly faster. The final update statement is very
fast. The process is somewhat slow in the beginning as it sucks in
records from "total" into "xq_total", but once all of that is moved into
the temp table, it rushes through the rest.

Databases like to think in sets. It will generally be more efficient to 
do set operations instead of a bunch of row-by-row stuff.

Since you're pulling all of this from some other table your best bet is 
probably something like:

CREATE TEMP TABLE raw AS DELETE FROM queue WHERE ... RETURNING *;

CREATE TEMP VIEW hourly_v AS SELECT ... FROM raw GROUP BY;
UPDATE ar_hourly SET ... FROM hourly_v JOIN ...;
INSERT INTO ar_hourly SELECT FROM hourly_v LEFT JOIN ar_hourly ON ...;

-- Same thing for daily
-- Same thing for total

    The other option would be to use a constraint trigger paired with a
    per-row trigger on the hourly table to drive the daily table, and on
    the daily table to drive the total table. The way that would work is
    the per-row table would simply keep track of all the unique records
    that were changed in a statement (presumably by putting them in a
    temp table). Once the statement is "done", the constraint trigger
    would fire; it would summarize all the changed data and do a much
    smaller number of updates to the table being summarized into.

I'm not sure how I would be able to avoid the same number of changes on
the total table, trigger would fire on each update, won't it? So, same
problem with a lot of changes on a table...

The difference is that you'd be doing plain INSERTs into a temp table 
and then summarizing that. That's going to be a LOT more efficient than 
a slew of updates on an existing table.

    BTW, you also made a comment about not having to hit the table if
    you look at something in an index. You can only do that if all the
    data you need is in the index, AND the page with the record is
    marked as being all-visible (google for Postgres Visibility Map). If
    that's not the case then you still have to pull the row in the table
    in, in order to determine visibility. The only case where you can
    still avoid hitting the table is something like a NOT EXISTS; if you
    can't find any entries in the index for something then they
    definitely won't be in the table.

What I was saying is that if a table has a unique index, and there is
cached fact that a particular index value points to a particular row,
there shouldn't be a need to re-scan the index again to search for any
more matching values (which would be necessary if the index was not
unique). Again, all considering the size of the index, the amount of
different index values that are being queried, etc.

It still has to rescan because of visibility concerns.

    But remember that if you update or delete a row, removing it from an
    index, the data will stay in that index until vacuum comes along.

    Also, there's no point in doing a REINDEX after a VACUUM FULL;
    vacuum full rebuilds all the indexes for you.

I was being desperate :)

I still think there is something very wrong with this particular table.
First, I have production systems that employ this function on way larger
data set, and there is no problem (so far, but still). This machine is
part of a test deployment, there is no constant load, the only data that
is being written now is when I do these tests. Vacuuming should prune
all that dead stuff, and if it's absent, it's unclear where is the time
spent navigating/updating the table with 24 rows :)

I think you definitely have a problem with dead rows, as evidenced by 
the huge improvement VACUUM FULL made.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general