On Thu, Nov 8, 2012 at 3:36 AM, Denis <socsam@xxxxxxxxx> wrote: > We have a web application where we create a schema or a database with a > number of tables in it for each customer. Now we have about 2600 clients. > > The problem we met using a separate DB for each client is that the creation > of new DB can take up to 2 minutes, that is absolutely unacceptable. Using > schemes instead (one DB with a number of schemes containing similar tables > in it) solved this problem (schemes are created in a couple of seconds), but > created two other blocking points: Sure: db creation can be a bear particularly on servers already under load; it's i/o intensive. I think you made the right choice: it's not a good idea to create databases via user input (but if I *had* to do that, I would probably be pre-creating them). > 1. sometimes creation of a new table in a schema takes up to 5 seconds. In > case when we have create up to 40 tables in a schema this takes way too much > time. How are you creating the tables. What's the iowait on the sever in situations like this? If the file system is binding you here, there's a not a lot you can do other than to try and pre-create or improve i/o performance. > 2. "pg_dump -n schema_name db_name" takes from 30 to 60 seconds, no matter > how big is the amount of data in the schema. Also, the dump of the tables > structure only takes at least 30 seconds. Basing on this topic > http://postgresql.1045698.n5.nabble.com/pg-dump-and-thousands-of-schemas-td5709766i60.html, > pg_dump always analyses ALL the tables in the DB, i.e. in my case more than > 100 000 tables. That may be correct. To prove it, try: pg_dump -s -n schema_name db_name where '-s' is the switch to dump only schema. if things are still slow, try logging queries from pg_dump (maybe enable log_min_duration_statement if you can) and maybe something turns up that can be optimized. One you have the query, explain analyze it and post it to the list. postgresql has been continuously improving in terms of bulk table handling over the years and there may be some low hanging fruit there. There may even be some simple database tweaks you can make to improve thing without code changes. Magnus already answered this pretty well, but I'd like to add: the database engine scales pretty well to large amounts of table but in particular cases the tools are not. The reason for this is that the engine deals with internal optimized structures while the tools have to do everything over SQL. That said, there may be some low hanging optimization fruit; it's a lot easier to hack on client side tools vs the backend. > I know you guys will ask me about selecting this particular application > architecture. Sharding by schema is pretty common actually. 6000 schema holding tables is a lot -- so the question you should be getting is 'given the current state of affairs, have you considered distributing your clients across more than one server'. What if you suddenly sign 10,000 more clients? > Different clients have different activity rate and we can select different > backup strategies according to it. This would be impossible in case we keep > all the clients data in one table. > Besides all the above mentioned, the probability of massive data corruption > (if an error in our web application occurs) is much higher. sure -- all large databases struggle with backups once the brute force dump starts to become impractical. the way forward is to explore various high availability options -- PITR, HS/SR etc. distributing the backup load across shards is also good as long as your rigorous about not involving any shared structures. > Not to start a holywar, but FYI: in a similar project where we used MySQL > now we have about 6000 DBs and everything works like a charm. no worries. postgres schemas are fancier than mysql databases and this is one of those things were extra features really do impact performance :-). merlin -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance