I'm
playing with various data models to compare performance and
practicalities and not sure if I should be surprised by the
numbers I'm getting. I hope this report isn't too wishy-washy
for reasoned comment. One model says a genotype is defined as follows: Table "public.oldstyle"I loaded 37M+ records using jOOQ (batching every 1000 lines) in 12+ hours (800+ records/sec). Then I tried COPY and killed that after 11.25 hours when I realised that I had added on non-unque index on the name fields after the first load. By that point is was on line 28301887, so ~0.75 done which implies it would have take ~15hours to complete. Would the overhead of the index likely explain this decrease in throughput? Impatience got the better of me and I killed the second COPY. This time it had done 54% of the file in 6.75 hours, extrapolating to roughly 12 hours to do the whole thing. That matches up with the java speed. Not sure if I should be elated with jOOQ or disappointed with COPY. Btw, I can load the roughly the same data in to the model below in 10.5 seconds. It only adds 39 very wide lines. I haven't got to the practicality bits yet :) Table "public.chipcall" This just a dev desktop environment: RHEL 6.5 PostgreSQL 9.3.1 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-4 [hm, interesting contradiction]), 64-bit 2 QuadCore cpu MHz: 2925.878 The input file and postgres data are on separate disks, but only one controller. Thanks in advance, even if you only read this far. |