Just stumbled across this recent article published in the Communications of the ACM: http://cacm.acm.org/magazines/2009/8/34493-the-pathologies-of-big-data/fulltext The author shares some insights relating to difficulties processing a 6.75 billion-row table, a dummy table representing census-type data for everyone on earth, in Postgres. I'd really like to replicate the author's experiment, but it's not clear from the article what his table definition looks like. He claims to be using a 16-byte record to store the several columns he needs for each row, so perhaps he's using a user-defined type? The author implies with his definition of "big data" that the dataset he analyzed is "... too large to be placed in a relational database... ". From Fig. 2, the SELECT query he ran took just under 10^5 seconds (~28 hours) when run on 6.75 billion rows. This amount of time for the query didn't seem surprising to me given how many rows he has to process, but in a recent post on comp.databases.ingres someone claimed that on a far-inferior PC, Ingres ran the same SELECT query in 105 minutes! This would be very impressive (a 10-fold improvement over Postgres) if true. The author complained that "on larger tables [Postgres' planner] switched to sorting by grouping columns", which he blamed for the slow query execution. I don't personally see this plan as a problem, but maybe someone can enlighten me. One intriguing tidbit I picked up from the article: "in modern systems, as demonstrated in the figure, random access to memory is typically slower than sequential access to disk." In hindsight, this seems plausible (since modern disks can sustain sequential reads at well over 100MB/sec). Anyway, it would be very interesting to attempt to speed up the author's query if at all possible. -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance