On Fri, Nov 9, 2012 at 7:15 AM, Craig Ringer <craig@xxxxxxxxxxxxxxx> wrote: > On 11/08/2012 09:29 PM, Denis wrote: >> Ok guys, it was not my intention to hurt anyone's feelings by mentioning >> MySQL. Sorry about that. > It's pretty silly to be upset by someone mentioning another DB product. > I wouldn't worry. >> There simply was a project with a similar >> architecture built using MySQL. When we started the current project, I have >> made a decision to give PostgreSQL a try. > It's certainly interesting that MySQL currently scales to much larger > table counts better than PostgreSQL appears to. > > I'd like to see if this can be improved down the track. Various people > are doing work on PostgreSQL scaling and performance, so with luck huge > table counts will come into play there. If nothing else, supporting > large table counts is important when dealing with very large amounts of > data in partitioned tables. > > I think I saw mention of better performance with higher table counts in > 9.3 in -hackers, too. > >> I would recommend you to refresh the info here >> http://wiki.postgresql.org/wiki/FAQ. There is a question "What is the >> maximum size for a row, a table, and a database?". Please add there info on >> maximum DBs number and tables number one DB can contain while PostgreSQL >> continues to work properly. > Yeah, a number of people have been thrown by that. Technical limitations > aren't the same as practical limitations, and in some cases the > practical limitations are lower. Yes. And the fact is that, PostgreSQL doesn't actually have a big problem with this scenario. pg_dump does. It's mainly a tool issue, not a database engine one. For example, pg_admin used to have problems even with a much smaller number of databases than that - I think that is better in current releases, but I'm not 100% sure. And you can imagine what a tool looks like that tries to graph per database values, for example, into a single graph when you have thousands of databases. PostgreSQL isn't perfect in these cases - as noted just creating a new schema can take slightly more than the usual millisecond. But it works, and I've never come across a scenario personally where it's not "good enough" (that doesn't mean it doesn't exist, of course). pg_dump and pg_dumpall, however, do have issues, as noted elsewhere as well. But as Pavel mentioned, and others certainly have before as well, there are other ways to deal with backups that solve this problem. It may not be perfect, but it gets you pretty far. It may not be a match for the requirements in this case - but it's not a general limitation of the db. FWIW, have you looked into using proper backups with PITR and just rolling forward through the transaction log with pause_at_recovery_target set? That's an excellent way to deal with the "we lost data sometime between <x> and <y> but don't know when" when tracking it down. I've yet to find a case where that's not easier than repeatedly restoring even a fairly small pg_dump based backup to find it. Plus, it give you a much better granularity, which leads to better results for your customer. *And* it takes away the dependency on pg_dump's performance issues. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance