Re: Slow count(*) again...

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wednesday 13 October 2010 00:19:26 Mark Kirkwood wrote:
> On 13/10/10 19:47, Neil Whelchel wrote:
> > Nope...
> > So, possible conclusions are:
> > 1. Even with VACUUM database table speed degrades as tables are updated.
> > 2. Time testing on a freshly INSERTed table gives results that are not
> > real- world.
> > 3. Filesystem defragmentation helps (some).
> > 4. Cache only makes a small difference once a table has been UPDATEd.
> > 
> > I am going to leave this configuration running for the next day or so.
> > This way I can try any suggestions and play with any more ideas that I
> > have. I will try these same tests on ext4 later, along with any good
> > suggested tests.
> > I will try MySQL with the dame data with both XFS and ext4.
> > -Neil-
> 
> I think that major effect you are seeing here is that the UPDATE has
> made the table twice as big on disk (even after VACUUM etc), and it has
> gone from fitting in ram to not fitting in ram - so cannot be
> effectively cached anymore.
> 
> This would not normally happen in real life (assuming UPDATEs only
> modify a small part of a table  per transaction). However administration
> updates (e.g 'oh! -  ref 1 should now be ref 2 please update
> everything') *will* cause the table size to double.
> 
> This is an artifact of Postgres's non overwriting storage manager -
> Mysql will update in place and you will not see this.
> 
> Try VACUUM FULL on the table and retest.
> 
> regards
> 
> Mark

There seems to be allot of discussion about VACUUM FULL, and its problems. The 
overall buzz seems to be that VACUUM FULL is a bad idea (I could be wrong 
here). It has been some time since I have read the changelogs, but I seem to 
remember that there have been some major changes to VACUUM FULL recently. 
Maybe this needs to be re-visited in the documentation.

crash:~# time psql -U test test -c "VACUUM FULL log;"
VACUUM

real    4m49.055s
user    0m0.000s
sys     0m0.000s

crash:~# time psql -U test test -c "SELECT count(*) FROM log;"
  count   
----------
 10050886
(1 row)

real    0m9.665s
user    0m0.000s
sys     0m0.004s

A huge improvement from the minute and a half before the VACUUM FULL.
crash:~# time psql -U test test -c "SELECT count(*) FROM log;"
  count   
----------
 10050886
(1 row)

real    0m3.786s
user    0m0.000s
sys     0m0.000s

And the cache helps...
So, we are right back to within 10ms of where we started after INSERTing the 
data, but it took a VACUUM FULL to accomplish this (by making the table fit in 
RAM).
This is a big problem on a production machine as the VACUUM FULL is likely to 
get in the way of INSERTing realtime data into the table. 

So to add to the conclusion pile:
5. When you have no control over the WHERE clause which may send count(*) 
through more rows of a table that would fit in RAM your performance will be 
too slow, so count is missing a LIMIT feature to avoid this.
6. Keep tables that are to be updated frequently as narrow as possible: Link 
them to wider tables to store the columns that are less frequently updated.

So with our conclusion pile so far we can deduce that if we were to keep all 
of our data in two column tables (one to link them together, and the other to 
store one column of data), we stand a much better chance of making the entire 
table to be counted fit in RAM, so we simply apply the WHERE clause to a 
specific table as opposed to a column within a wider table... This seems to 
defeat the entire goal of the relational database...

-Neil-

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux