jnasby@xxxxxxxxxxxxx ("Jim C. Nasby") writes: > On Thu, Mar 23, 2006 at 09:22:34PM -0500, Christopher Browne wrote: >> Martha Stewart called it a Good Thing when smarlowe@xxxxxxxxxxxxxxxxx (Scott Marlowe) wrote: >> > On Thu, 2006-03-23 at 10:43, Joshua D. Drake wrote: >> >> > Has someone been working on the problem of splitting a query into pieces >> >> > and running it on multiple CPUs / multiple machines? Yes. Bizgress has >> >> > done that. >> >> >> >> I believe that is limited to Bizgress MPP yes? >> > >> > Yep. I hope that someday it will be released to the postgresql global >> > dev group for inclusion. Or at least parts of it. >> >> Question: Does the Bizgress/MPP use threading for this concurrency? >> Or forking? >> >> If it does so via forking, that's more portable, and less dependent on >> specific complexities of threading implementations (which amounts to >> non-portability ;-)). >> >> Most times Jan comes to town, we spend a few minutes musing about the >> "splitting queries across threads" problem, and dismiss it again; if >> there's the beginning of a "split across processes," that's decidedly >> neat :-). > > Correct me if I'm wrong, but there's no way to (reasonably) accomplish > that without having some dedicated extra processes laying around that > you can use to execute the queries, no? In other words, the cost of a > fork() during query execution would be too prohibitive... Counterexample... The sort of scenario we keep musing about is where you split off a (thread|process) for each partition of a big table. There is in fact a natural such partitioning, in that tables get split at the 1GB mark, by default. Consider doing a join against 2 tables that are each 8GB in size (e.g. - they consist of 8 data files). Let's assume that the query plan indicates doing seq scans on both. You *know* you'll be reading through 16 files, each 1GB in size. Spawning a process for each of those files doesn't strike me as "prohibitively expensive." A naive read on this is that you might start with one backend process, which then spawns 16 more. Each of those backends is scanning through one of those 16 files; they then throw relevant tuples into shared memory to be aggregated/joined by the central one. That particular scenario is one where the fork()s would hardly be noticeable. > FWIW, DB2 executes all queries in a dedicated set of processes. The > process handling the connection from the client will pass a query > request off to one of the executor processes. I can't remember which > process actually plans the query, but I know that the executor runs > it. It seems to me that the kinds of cases where extra processes/threads would be warranted are quite likely to be cases where fork()ing may be an immaterial cost. -- let name="cbbrowne" and tld="ntlug.org" in String.concat "@" [name;tld];; http://www.ntlug.org/~cbbrowne/languages.html TECO Madness: a moment of convenience, a lifetime of regret. -- Dave Moon