Re: db size and VACUUM ANALYZE

Marcin Krol <mrkafk@xxxxxxxxx> · Fri, 12 Feb 2010 20:20:19 +0100

Bill Moran wrote:
Note that the "correct" disk size for your database is probably closer
to the 1.6G you were seeing before.  

This might be the case, but how do I find out what are the "correct" sizes?

I have a script that does following queries:

SELECT relpages * 8192 AS size_in_bytes, relname FROM pg_class WHERE 
relnamespace = (SELECT oid FROM pg_namespace WHERE nspname = 'public') 
ORDER BY size_in_bytes DESC LIMIT 10;

SELECT SUM(sizes.size_in_bytes) AS total_size_for_top_10_tables FROM 
(SELECT relpages * 8192 AS size_in_bytes, relname FROM pg_class WHERE 
relnamespace = (SELECT oid FROM pg_namespace WHERE nspname = 'public') 
ORDER BY size_in_bytes DESC LIMIT 10) AS sizes;

SELECT SUM(sizes.size_in_bytes) AS total_size_for_all_tables FROM 
(SELECT relpages * 8192 AS size_in_bytes, relname FROM pg_class WHERE 
relnamespace = (SELECT oid FROM pg_namespace WHERE nspname = 'public')) 
AS sizes;

Result before (1.6G db):

 size_in_bytes |       relname
---------------+----------------------
     806387712 | cs_ver_digests_pkey
     103530496 | oai_edi_atts_pkey
      62021632 | cs_ver_paths
      61734912 | cs_ver_digests
      55721984 | cs_fil_paths
      45309952 | met_files
      38412288 | met_versions
      26247168 | cs_ver_content_types
      25444352 | met_edi_ver
      23724032 | met_edi_atts
(10 rows)

 total_size_for_top_10_tables
------------------------------
                   1248534528
(1 row)

 total_size_for_all_tables
---------------------------
                1467809792

Results now (600M db):

 size_in_bytes |          relname
---------------+---------------------------
      62169088 | cs_ver_paths
      55828480 | cs_fil_paths
      45441024 | met_files
      42000384 | cs_ver_digests
      37552128 | met_versions
      25509888 | met_edi_ver
      24215552 | cs_ver_content_types
      20717568 | met_edi_atts
      18186240 | met_edi_ver_pkey
      13565952 | cs_ver_content_types_pkey
(10 rows)

 total_size_for_top_10_tables
------------------------------
                    345186304
(1 row)

 total_size_for_all_tables
---------------------------
                 467476480
(1 row)

This allows PG some free space
within the data files to add/remove records.  vacuum full removes this
space, and you'll likely find that the files will simply expand to
use it again.  Vaccuum (without full) keeps that space at an equilibrium.

I don't mind slight performance degradation, the problem is that it is 
2nd time that beyond certain db size the performance degradation tends 
to be almost runaway.

As to performance degradation, you'll always see performance hits as
your database size increases.  I'm assuming from your need to ask about
this issue that the degradation was significant.  

Yes, to the point of unacceptable (that is, queries took like 20-30 
seconds).

In that case, you first
want to make sure that the tables in the database have indexes in
all the right places -- in my experience, this is the biggest cause of
performance issues.  Use of EXPLAIN ANALYZE on queries that are performing
slow will usually indicate where indexes can help.

I'll try, though that will not be easy as they are complex and were not 
written by me (it's a closed system).

From there, you may simply have too little hardware for the database to
run at the speed you expect.  

You see that's the weird thing: the machine in question has 4 cpus and 
4G of ram. When the performance was unacceptable, the loadavg was around 
1, all cpus were slightly loaded, and iostat didn't show much happening 
on the disks. The one thing I remember is that there were many 
postmaster processes (like 20), they had huge virtual sizes (like 800m) 
and large resident sizes (like 300M).

On top of having the pg_dump backup, I have copied the binary files of 
db when pg was stopped. I could play with those files (change them under 
 the same pg config on another machine).

Giving it more RAM is cheap and tends to
work wonders.  Any time the system runs out of RAM, it needs to use disk
instead, which significantly hurts performance.

This is my memory config:

shared_buffers = 768MB

temp_buffers = 32MB                     # min 800kB

work_mem = 32MB                         # min 64kB

max_stack_depth = 256MB                 # min 100kB

max_fsm_pages = 153600

% sysctl -a | grep shm
kernel.shmmni = 4096
kernel.shmall = 262144
kernel.shmmax = 1073741824

--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general