Stephan, On 11/27/05 7:48 AM, "Stephan Szabo" <sszabo@xxxxxxxxxxxxxxxxxxxxx> wrote: > On Sun, 27 Nov 2005, Luke Lonergan wrote: > >> Has anyone done the math.on the original post? 5TB takes how long to >> scan once? If you want to wait less than a couple of days just for a >> seq scan, you'd better be in the multi-gb per second range. > > Err, I get about 31 megabytes/second to do 5TB in 170,000 seconds. I think > perhaps you were exaggerating a bit or adding additional overhead not > obvious from the above. ;) Thanks - the calculator on my blackberry was broken ;-) > At 1 gigabyte per second, 1 terrabyte should take about 1000 seconds > (between 16 and 17 minutes). The impressive 3.2 gigabytes per second > listed before (if it actually scans consistently at that rate), puts it at > a little over 5 minutes I believe for 1, so about 26 for 5 terrabytes. > The 200 megabyte per second number puts it about 7 hours for 5 > terrabytes AFAICS. 7 hours, days, same thing ;-) On the reality of sustained scan rates like that: We're getting 2.5GB/s sustained on a 2 year old machine with 16 hosts and 96 disks. We run them in RAID0, which is only OK because MPP has built-in host to host mirroring for fault management. We just purchased a 4-way cluster with 8 drives each using the 3Ware 9550SX. Our thought was to try the simplest approach first, which is a single RAID5, which gets us 7 drives worth of capacity and performance. As I posted earlier, we get about 400MB/s seq scan rate on the RAID, but the Postgres 8.0 current scan rate limit is 64% of 400MB/s or 256MB/s per host. The 8.1 mods (thanks Qingqing and Tom!) may increase that significantly toward the 400 max - we've already merged the 8.1 codebase into MPP so we'll also feature the same enhancements. Our next approach is to run these machines in a split RAID0 configuration, or RAID0 on 4 and 4 drives. We then run an MPP "segment instance" bound to each CPU and I/O channel. At that point, we'll have all 8 drives of performance and capacity per host and we should get 333MB/s with current MPP and perhaps over 400MB/s with MPP/8.1. That would get us up to the 3.2GB/s for 8 hosts. Even better, all operators are executed on all CPUs for each query, so sorting, hashing, agg, etc etc are run on all CPUs in the cluster. - Luke