gnuoytr@xxxxxxx writes: > Time for my pet meme to wiggle out of its hole (next to Phil's, and a > day later). For PG to prosper in the future, it has to embrace the > multi-core/processor/SSD machine at the query level. It has to. And > it has to because the Big Boys already do so, to some extent, and > they've realized that the BCNF schema on such machines is supremely > efficient. PG/MySql/OSEngineOfChoice will get left behind simply > because the efficiency offered will be worth the price. > > I know this is far from trivial, and my C skills are such that I can > offer no help. These machines have been the obvious "current" machine > in waiting for at least 5 years, and those applications which benefit > from parallelism (servers of all kinds, in particular) will filter out > the winners and losers based on exploiting this parallelism. > > Much as it pains me to say it, but the MicroSoft approach to software: > write to the next generation processor and force users to upgrade, > will be the winning strategy for database engines. There's just way > too much to gain. I'm not sure how true that is, really. (e.g. - "too much to gain.") I know that Jan Wieck and I have been bouncing thoughts on valid use of threading off each other for *years*, now, and it tends to be interesting but difficult to the point of impracticality. But how things play out are quite fundamentally different for different usage models. It's useful to cross items off the list, so we're left with the tough ones that are actually a problem. 1. For instance, OLTP applications, that generate a lot of concurrent connections, already do perfectly well in scaling on multi-core systems. Each connection is a separate process, and that already harnesses multi-core systems perfectly well. Things have improved a lot over the last 10 years, and there may yet be further improvements to be found, but it seems pretty reasonable to me to say that the OLTP scenario can be treated as "solved" in this context. The scenario where I can squint and see value in trying to multithread is the contrast to that, of OLAP. The case where we only use a single core, today, is where there's only a single connection, and a single query, running. But that can reasonably be further constrained; not every single-connection query could be improved by trying to spread work across cores. We need to add some further assumptions: 2. The query needs to NOT be I/O-bound. If it's I/O bound, then your system is waiting for the data to come off disk, rather than to do processing of that data. That condition can be somewhat further strengthened... It further needs to be a query where multi-processing would not increase the I/O burden. Between those two assumptions, that cuts the scope of usefulness to a very considerable degree. And if we *are* multiprocessing, we introduce several new problems, each of which is quite troublesome: - How do we decompose the query so that the pieces are processed in ways that improve processing time? In effect, how to generate a parallel query plan? It would be more than stupid to consider this to be "obvious." We've got 15-ish years worth of query optimization efforts that have gone into Postgres, and many of those changes were not "obvious" until after they got thought through carefully. This multiplies the complexity, and opportunity for error. - Coordinating processing Becomes quite a bit more complex. Multiple threads/processes are accessing parts of the same data concurrently, so a "parallelized query" that harnesses 8 CPUs might generate 8x as many locks and analogous coordination points. - Platform specificity Threading is a problem in that each OS platform has its own implementation, and even when they claim to conform to common standards, they still have somewhat different interpretations. This tends to go in one of the following directions: a) You have to pick one platform to do threading on. Oops. There's now PostgreSQL-Linux, that is the only platform where our multiprocessing thing works. It could be worse than that; it might work on a particular version of a particular OS... b) You follow some apparently portable threading standard And find that things are hugely buggy because the platforms follow the standard a bit differently. And perhaps this means that, analogous to a), you've got a set of platforms where this "works" (for some value of "works"), and others where it can't. That's almost as evil as a). c) You follow some apparently portable threading standard And need to wrap things in a pretty thick safety blanket to make sure it is compatible with all the bugs in interpretation and implementation. Complexity++, and performance probably suffers. None of these are particularly palatable, which is why threading proposals get a lot of pushback. At the end of the day, if this is only providing value for a subset of use cases, involving peculiar-ish conditions, well, it's quite likely wiser for most would-be implementors to spend their time on improvements likely to help a larger set of users that might, in fact, include those that imagine that this parallelization would be helpful. -- select 'cbbrowne' || '@' || 'acm.org'; http://www3.sympatico.ca/cbbrowne/x.html FLORIDA: Where your vote counts and counts and counts. -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance