Re: 1 TB of memory

"Luke Lonergan" <llonergan@xxxxxxxxxxxxx> · Fri, 17 Mar 2006 15:03:19 -0800

Jim,

On 3/17/06 9:36 AM, "Jim C. Nasby" <jnasby@xxxxxxxxxxxxx> wrote:

> Now what happens as soon as you start doing random I/O? :)

Well - given that we've divided the data into 32 separate segments, and that
seeking is done in parallel over all 256 disk drives, random I/O rocks hard
and scales.  Of course, the parallelizing planner is designed to minimize
seeking as much as possible, as is the normal Postgres planner, but with
more segment and more parallel platters, seeking is faster.

The biggest problem with this idea of "put huge amounts of data on your SSD
and everything is infinitely fast" is that it ignores several critical
scaling factors:
- How much bandwidth is available in and out of the device?
- Does that bandwidth scale as you grow the data?
- As you grow the data, how long does it take to use the data?
- Can more than 1 CPU use the data at once?  Do they share the path to the
data?

If you are accessing 3 rows at a time from among billions, the problem you
have is mostly access time - so an SSD might be very good for some OLTP
applications.  However - the idea of putting Terabytes of data into an SSD
through a thin straw of a channel is silly.

Note that SSDs have been around for a *long* time.  I was using them on Cray
X/MP and 2 supercomputers back in 1987-92, when we had a 4 Million Word SSD
connected over a 2GB/s channel.  In fact, some people I worked with built a
machine with 4 Cray 2 computers that shared an SSD between them for parallel
computing and it was very effective, and also ungodly expensive and special
purpose.

- Luke