Re: Are there any options to parallelize queries?

Pavel Stehule <pavel.stehule@xxxxxxxxx> · Tue, 21 Aug 2012 11:20:50 +0200



Hello

2012/8/21 Seref Arikan <serefarikan@xxxxxxxxxxxxxxxxxxxxx>:
> Dear all,
> I am designing an electronic health record repository which uses postgresql
> as its RDMS technology. For those who may find the topic interesting, the
> EHR standard I specialize in is openEHR: http://www.openehr.org/
>

http://stormdb.com/community/stado?destination=node%2F8

Regards

Pavel Stehule


> My design makes use of parallel execution in the layers above DB, and it
> seems to scale quite good. However, I have a scale problem at hand. A single
> patient can have up to 1 million different clinical data entries on his/her
> own, after a few decades of usage. Clinicians do love their data, and
> especially in chronic disease management, they demand access to whatever
> data exists. If you have 20 years of data for a diabetics patient for
> example, they'll want to look for trends in that, or even scroll through all
> of it, maybe with some filtering.
> My requirement is to be able to process those 1 million records as fast as
> possible. In case of population queries, we're talking about billions of
> records. Each clinical record, (even with all the optimizations our domain
> has developed in the last 30 or so years), leads to a number of rows, so you
> can see that this is really big data. (imagine a national diabetes registry
> with lifetime data of a few million patients)
> I am ready to consider Hadoop or other non-transactional approaches for
> population queries, but clinical care still requires that I process millions
> of records for a single patient.
>
> Parallel software frameworks such as Erlang's OTP or Scala's Akka do help a
> lot, but it would be a lot better if I could feed those frameworks with data
> faster. So, what options do I have to execute queries in parallel, assuming
> a transactional system running on postgresql? For example I'd like to get
> last 10 years' records in chunks of 2 years of data, or chunks of 5K
> records, fed to N number of parallel processing machines. The clinical
> system should keep functioning in the mean time, with new records added etc.
> PGPool looks like a good option, but I'd appreciate your input. Any proven
> best practices, architectures, products?
>
> Best regards
> Seref
>


-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general