Re: Megabytes of stats saved after every connection

Phil Endecott <spam_from_postgresql_general@xxxxxxxxxxxx> · Thu, 28 Jul 2005 20:31:21 +0100

Hello again,

Just to give a bit of background, in case it is useful: this is my 
family tree website, treefic.com.  I have a schema for each user, each 
with about a dozen tables.  In most cases the tables are small, i.e. 
tens of entries, but the users I care about are the ones with tens of 
thousands of people in their trees.  The schemas are independent of each 
other.  Example web page: http://treefic.com/treefic/royal92

Jan Wieck <JanWieck@xxxxxxxxx> writes:
PostgreSQL itself doesn't work too well with tens of thousands of 
tables.

I've specifically asked about this here before.  This is obviously 
important for my application so I invite all readers to share any 
thoughts they might have about possible problems with large numbers of 
tables.  I also create and drop large numbers of temporary tables - can 
anyone think of any additional problems with that?

Issues I have discussed here before include tab-completion in psql 
(unimportant) and autovacuum's O(n^2) performance (important).

Okay, I should be more specific. The problem with tens of thousands of 
tables does not exist just because of them being there. It will emerge 
if all those tables are actually used because it will mean that you'd 
need all the pg_class and pg_attribute rows cached and also your vfd 
cache will constantly rotate.

If many trees are being viewed simultaneously, another part of the 
system will be the bottleneck.  Within any, say, 5 minute period, only 
hundreds of tables will be in use.

Then again, the stats file is only written. There is nothing that 
actually forces the blocks out. On a busy system, one individual stats 
file will be created, written to, renamed, live for 500ms and be thrown 
away by the next stat files rename operation. I would assume that with a 
decent filesystem and appropriate OS buffers, none of the data blocks of 
most stat files even hit the disk. I must be missing something.

This is possibly true --- Phil, do you see actual disk I/O happening
from the stats writes, or is it just kernel calls?

During my tests the system was idle; I would run "psql -c 'select 1;'" 
and see the blocks in vmstat's "bo" column a couple of seconds later. 
As I understand it that indicates actual I/O, and the delay suggests 
that it is being flushed by the kernel.  When the system is busy it is 
harder to see what is going on and it is possible that at least some of 
this activity was not being written to the disk.  Typically I would see 
a lot more write bandwidth than read bandwidth (by a factor of 5 or so) 
according to vmstat;  any advice about how to identify what files or 
processes are involved would be appreciated.  I had previously imagined 
that it could be temporary tables.  This is Linux 2.4.26 and an ext3 
filesystem.

Having disabled stats earlier my stats file is still quite small. 
Presumably it will gradually grow back.  In the meantime I cannot do any 
experiments.

Thanks as ever for your prompt responses.

Regards,

--Phil.

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend