Hello again,
Just to give a bit of background, in case it is useful: this is my
family tree website, treefic.com. I have a schema for each user, each
with about a dozen tables. In most cases the tables are small, i.e.
tens of entries, but the users I care about are the ones with tens of
thousands of people in their trees. The schemas are independent of each
other. Example web page: http://treefic.com/treefic/royal92
Jan Wieck <JanWieck@xxxxxxxxx> writes:
PostgreSQL itself doesn't work too well with tens of thousands of
tables.
I've specifically asked about this here before. This is obviously
important for my application so I invite all readers to share any
thoughts they might have about possible problems with large numbers of
tables. I also create and drop large numbers of temporary tables - can
anyone think of any additional problems with that?
Issues I have discussed here before include tab-completion in psql
(unimportant) and autovacuum's O(n^2) performance (important).
Okay, I should be more specific. The problem with tens of thousands of
tables does not exist just because of them being there. It will emerge
if all those tables are actually used because it will mean that you'd
need all the pg_class and pg_attribute rows cached and also your vfd
cache will constantly rotate.
If many trees are being viewed simultaneously, another part of the
system will be the bottleneck. Within any, say, 5 minute period, only
hundreds of tables will be in use.
Then again, the stats file is only written. There is nothing that
actually forces the blocks out. On a busy system, one individual stats
file will be created, written to, renamed, live for 500ms and be thrown
away by the next stat files rename operation. I would assume that with a
decent filesystem and appropriate OS buffers, none of the data blocks of
most stat files even hit the disk. I must be missing something.
This is possibly true --- Phil, do you see actual disk I/O happening
from the stats writes, or is it just kernel calls?
During my tests the system was idle; I would run "psql -c 'select 1;'"
and see the blocks in vmstat's "bo" column a couple of seconds later.
As I understand it that indicates actual I/O, and the delay suggests
that it is being flushed by the kernel. When the system is busy it is
harder to see what is going on and it is possible that at least some of
this activity was not being written to the disk. Typically I would see
a lot more write bandwidth than read bandwidth (by a factor of 5 or so)
according to vmstat; any advice about how to identify what files or
processes are involved would be appreciated. I had previously imagined
that it could be temporary tables. This is Linux 2.4.26 and an ext3
filesystem.
Having disabled stats earlier my stats file is still quite small.
Presumably it will gradually grow back. In the meantime I cannot do any
experiments.
Thanks as ever for your prompt responses.
Regards,
--Phil.
---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend