Search Postgresql Archives

Re: Megabytes of stats saved after every connection

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello again,

Just to give a bit of background, in case it is useful: this is my family tree website, treefic.com. I have a schema for each user, each with about a dozen tables. In most cases the tables are small, i.e. tens of entries, but the users I care about are the ones with tens of thousands of people in their trees. The schemas are independent of each other. Example web page: http://treefic.com/treefic/royal92

Jan Wieck <JanWieck@xxxxxxxxx> writes:
PostgreSQL itself doesn't work too well with tens of thousands of tables.

I've specifically asked about this here before. This is obviously important for my application so I invite all readers to share any thoughts they might have about possible problems with large numbers of tables. I also create and drop large numbers of temporary tables - can anyone think of any additional problems with that?

Issues I have discussed here before include tab-completion in psql (unimportant) and autovacuum's O(n^2) performance (important).

Okay, I should be more specific. The problem with tens of thousands of tables does not exist just because of them being there. It will emerge if all those tables are actually used because it will mean that you'd need all the pg_class and pg_attribute rows cached and also your vfd cache will constantly rotate.

If many trees are being viewed simultaneously, another part of the system will be the bottleneck. Within any, say, 5 minute period, only hundreds of tables will be in use.

Then again, the stats file is only written. There is nothing that actually forces the blocks out. On a busy system, one individual stats file will be created, written to, renamed, live for 500ms and be thrown away by the next stat files rename operation. I would assume that with a decent filesystem and appropriate OS buffers, none of the data blocks of most stat files even hit the disk. I must be missing something.

This is possibly true --- Phil, do you see actual disk I/O happening
from the stats writes, or is it just kernel calls?

During my tests the system was idle; I would run "psql -c 'select 1;'" and see the blocks in vmstat's "bo" column a couple of seconds later. As I understand it that indicates actual I/O, and the delay suggests that it is being flushed by the kernel. When the system is busy it is harder to see what is going on and it is possible that at least some of this activity was not being written to the disk. Typically I would see a lot more write bandwidth than read bandwidth (by a factor of 5 or so) according to vmstat; any advice about how to identify what files or processes are involved would be appreciated. I had previously imagined that it could be temporary tables. This is Linux 2.4.26 and an ext3 filesystem.

Having disabled stats earlier my stats file is still quite small. Presumably it will gradually grow back. In the meantime I cannot do any experiments.

Thanks as ever for your prompt responses.

Regards,

--Phil.


---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux