Hello 2012/8/21 Seref Arikan <serefarikan@xxxxxxxxxxxxxxxxxxxxx>: > Dear all, > I am designing an electronic health record repository which uses postgresql > as its RDMS technology. For those who may find the topic interesting, the > EHR standard I specialize in is openEHR: http://www.openehr.org/ > http://stormdb.com/community/stado?destination=node%2F8 Regards Pavel Stehule > My design makes use of parallel execution in the layers above DB, and it > seems to scale quite good. However, I have a scale problem at hand. A single > patient can have up to 1 million different clinical data entries on his/her > own, after a few decades of usage. Clinicians do love their data, and > especially in chronic disease management, they demand access to whatever > data exists. If you have 20 years of data for a diabetics patient for > example, they'll want to look for trends in that, or even scroll through all > of it, maybe with some filtering. > My requirement is to be able to process those 1 million records as fast as > possible. In case of population queries, we're talking about billions of > records. Each clinical record, (even with all the optimizations our domain > has developed in the last 30 or so years), leads to a number of rows, so you > can see that this is really big data. (imagine a national diabetes registry > with lifetime data of a few million patients) > I am ready to consider Hadoop or other non-transactional approaches for > population queries, but clinical care still requires that I process millions > of records for a single patient. > > Parallel software frameworks such as Erlang's OTP or Scala's Akka do help a > lot, but it would be a lot better if I could feed those frameworks with data > faster. So, what options do I have to execute queries in parallel, assuming > a transactional system running on postgresql? For example I'd like to get > last 10 years' records in chunks of 2 years of data, or chunks of 5K > records, fed to N number of parallel processing machines. The clinical > system should keep functioning in the mean time, with new records added etc. > PGPool looks like a good option, but I'd appreciate your input. Any proven > best practices, architectures, products? > > Best regards > Seref > -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general