Re: Slow count(*) again...

Alex Hunsaker <badalex@xxxxxxxxx> · Wed, 13 Oct 2010 13:09:22 -0600

On Wed, Oct 13, 2010 at 02:38, Neil Whelchel <neil.whelchel@xxxxxxxxx> wrote:

> And the cache helps...
> So, we are right back to within 10ms of where we started after INSERTing the
> data, but it took a VACUUM FULL to accomplish this (by making the table fit in
> RAM).
> This is a big problem on a production machine as the VACUUM FULL is likely to
> get in the way of INSERTing realtime data into the table.

Right, but the real point is how often do you plan on mass updating
the table?  Thats (hopefully) the only time a vacuum full should be
needed.  Otherwise (auto) vacuum will probably work most of the time.

> 6. Keep tables that are to be updated frequently as narrow as possible: Link
> them to wider tables to store the columns that are less frequently updated.

Again I don't think its updated frequently so much as mass updated.  I
run some databases here that have tens to hundreds of updates every
second.  The difference is I don't update *all* 26 million rows at the
same time that often.  But If I did, Id probably want to lower the
fillfactor.

For example:
=> update log set raw_data = raw_data+1;
UPDATE 10050886
Time: 59387.021 ms
=> SELECT pg_size_pretty(pg_total_relation_size('log'));
 pg_size_pretty
----------------
 1939 MB

=> update log set raw_data = raw_data+1;
UPDATE 10050886
Time: 70549.425 ms
=> SELECT pg_size_pretty(pg_total_relation_size('log'));
 pg_size_pretty
----------------
 2909 MB

=> update log set raw_data = raw_data+1;
UPDATE 10050886
Time: 78551.544 ms
=> SELECT pg_size_pretty(pg_total_relation_size('log'));
 pg_size_pretty
----------------
 3879 MB

=> update log set raw_data = raw_data+1;
UPDATE 10050886
Time: 74443.945 ms
=> SELECT pg_size_pretty(pg_total_relation_size('log'));
 pg_size_pretty
----------------
 4848 MB

Here you see basically linear growth, after some vacuuming:

=> VACUUM log;
VACUUM
Time: 193055.857 ms
=> SELECT pg_size_pretty(pg_total_relation_size('log'));
 pg_size_pretty
----------------
 4848 MB

=> VACUUM log;
VACUUM
Time: 38281.541 ms
whopper=> SELECT pg_size_pretty(pg_total_relation_size('log'));
 pg_size_pretty
----------------
 4848 MB

=> VACUUM log;
VACUUM
Time: 28.531 ms
=> SELECT pg_size_pretty(pg_total_relation_size('log'));
 pg_size_pretty
----------------
 4848 MB

Hey... its not shrinking it at all...:
=> VACUUM verbose log;
INFO:  vacuuming "public.log"
INFO:  "log": found 0 removable, 0 nonremovable row versions in 31 out
of 620425 pages
DETAIL:  0 dead row versions cannot be removed yet.
There were 2511 unused item pointers.
0 pages are entirely empty.
CPU 0.00s/0.01u sec elapsed 0.01 sec.
INFO:  vacuuming "pg_toast.pg_toast_10544753"
INFO:  index "pg_toast_10544753_index" now contains 0 row versions in 1 pages
DETAIL:  0 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  "pg_toast_10544753": found 0 removable, 0 nonremovable row
versions in 0 out of 0 pages
DETAIL:  0 dead row versions cannot be removed yet.
There were 0 unused item pointers.
0 pages are entirely empty.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
VACUUM
Time: 29.070 ms

-- ok lets start over and this time set fillfactor to 50;
=> alter table log set (fillfactor = 50);
=> vacuum full log;
=> SELECT pg_size_pretty(pg_total_relation_size('log'));
 pg_size_pretty
----------------
 1963 MB

-- 2x the default size, lets see what an update does now
=> update log set raw_data = raw_data+1;
UPDATE 10050886
Time: 70424.752 ms
=> SELECT pg_size_pretty(pg_total_relation_size('log'));
 pg_size_pretty
----------------
 1963 MB

-- hey ! same size

=> update log set raw_data = raw_data+1;
UPDATE 10050886
Time: 58112.895 ms
=> SELECT pg_size_pretty(pg_total_relation_size('log'));
 pg_size_pretty
----------------
 1963 MB
(1 row)

-- Still the same

So in short... vacuum seems to fall over flat with mass updates, set a
lower fillfactor :).

> So with our conclusion pile so far we can deduce that if we were to keep all
> of our data in two column tables (one to link them together, and the other to
> store one column of data), we stand a much better chance of making the entire
> table to be counted fit in RAM,

I dunno about that... Seems like if you only had 2 tables both would
fail to fit in ram fairly quickly :)

> so we simply apply the WHERE clause to a
> specific table as opposed to a column within a wider table... This seems to
> defeat the entire goal of the relational database...

Sure... thats one answer.  See
http://wiki.postgresql.org/wiki/Slow_Counting for more.  But the basic
ideas are:
1) estimate the count
2) use triggers and keep the count somewhere else
3) keep it in ram

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance