Dear Greg, Have you checked PostgresXL ? with millions of table, how the apps choose which table is approriate? in my opinion, with that scale it should go with parallel query with data sharing like what PostgresXL is done. Thanks, Julyanto SUTANDANG Equnix Business Solutions, PT (An Open Source and Open Mind Company) www.equnix.co.id Pusat Niaga ITC Roxy Mas Blok C2/42. Jl. KH Hasyim Ashari 125, Jakarta Pusat T: +6221 22866662 F: +62216315281 M: +628164858028 Caution: The information enclosed in this email (and any attachments) may be legally privileged and/or confidential and is intended only for the use of the addressee(s). No addressee should forward, print, copy, or otherwise reproduce this message in any manner that would allow it to be viewed by any individual not originally listed as a recipient. If the reader of this message is not the intended recipient, you are hereby notified that any unauthorized disclosure, dissemination, distribution, copying or the taking of any action in reliance on the information herein is strictly prohibited. If you have received this communication in error, please immediately notify the sender and delete this message.Unless it is made by the authorized person, any views expressed in this message are those of the individual sender and may not necessarily reflect the views of PT Equnix Business Solutions. On Mon, Sep 26, 2016 at 9:50 AM, Greg Spiegelberg <gspiegelberg@xxxxxxxxx> wrote: > Hey all, > > Obviously everyone who's been in PostgreSQL or almost any RDBMS for a time > has said not to have millions of tables. I too have long believed it until > recently. > > AWS d2.8xlarge instance with 9.5 is my test rig using XFS on EBS (io1) for > PGDATA. Over the weekend, I created 8M tables with 16M indexes on those > tables. Table creation initially took 0.018031 secs, average 0.027467 and > after tossing out outliers (qty 5) the maximum creation time found was > 0.66139 seconds. Total time 30 hours, 31 minutes and 8.435049 seconds. > Tables were created by a single process. Do note that table creation is > done via plpgsql function as there are other housekeeping tasks necessary > though minimal. > > No system tuning but here is a list of PostgreSQL knobs and switches: > shared_buffers = 2GB > work_mem = 48 MB > max_stack_depth = 4 MB > synchronous_commit = off > effective_cache_size = 200 GB > pg_xlog is on it's own file system > > There are some still obvious problems. General DBA functions such as VACUUM > and ANALYZE should not be done. Each will run forever and cause much grief. > Backups are problematic in the traditional pg_dump and PITR space. Large > JOIN's by VIEW, SELECT or via table inheritance (I am abusing it in my test > case) are no-no's. A system or database crash could take potentially hours > to days to recover. There are likely other issues ahead. > > You may wonder, "why is Greg attempting such a thing?" I looked at > DynamoDB, BigTable, and Cassandra. I like Greenplum but, let's face it, > it's antiquated and don't get me started on "Hadoop". I looked at many > others and ultimately the recommended use of each vendor was to have one > table for all data. That overcomes the millions of tables problem, right? > > Problem with the "one big table" solution is I anticipate 1,200 trillion > records. Random access is expected and the customer expects <30ms reads for > a single record fetch. > > No data is loaded... yet Table and index creation only. I am interested in > the opinions of all including tests I may perform. If you had this setup, > what would you capture / analyze? I have a job running preparing data. I > did this on a much smaller scale (50k tables) and data load via function > allowed close to 6,000 records/second. The schema has been simplified since > and last test reach just over 20,000 records/second with 300k tables. > > I'm not looking for alternatives yet but input to my test. Takers? > > I can't promise immediate feedback but will do my best to respond with > results. > > TIA, > -Greg -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance