Re: Huge Data sets, simple queries

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




	I did a little test on soft raid1 :

	I have two 800 Mbytes files, say A and B. (RAM is 512Mbytes).

	Test 1 :
	1- Read A, then read B :
		19 seconds per file

	2- Read A and B simultaneously using two threads :
		22 seconds total (reads were paralleled by the RAID)

3- Read one block of A, then one block of B, then one block of A, etc. Essentially this is the same as the threaded case, except there's only one thread.
		53 seconds total (with heavy seeking noise from the hdd).

I was half expecting 3 to take the same as 2. It simulates, for instance, scanning a table and its index, or scanning 2 sort bins. Well, maybe one day...

It would be nice if the Kernel had an API for applications to tell it "I'm gonna need these blocks in the next seconds, can you read them in the order you like (fastest), from whatever disk you like, and cache them for me please; so that I can read them in the order I like, but very fast ?"


On Wed, 01 Feb 2006 09:25:13 +0100, Jeffrey W. Baker <jwbaker@xxxxxxx> wrote:

On Tue, 2006-01-31 at 21:53 -0800, Luke Lonergan wrote:
Jeffrey,

On 1/31/06 8:09 PM, "Jeffrey W. Baker" <jwbaker@xxxxxxx> wrote:
>> ... Prove it.
> I think I've proved my point.  Software RAID1 read balancing provides
> 0%, 300%, 100%, and 100% speedup on 1, 2, 4, and 8 threads,
> respectively.  In the presence of random I/O, the results are even
> better.
> Anyone who thinks they have a single-threaded workload has not yet
> encountered the autovacuum daemon.

Good data - interesting case. I presume from your results that you had to make the I/Os non-overlapping (the "skip" option to dd) in order to get the concurrent access to work. Why the particular choice of offset - 3.2GB in
this case?

No particular reason.  8k x 100000 is what the last guy used upthread.

So - the bandwidth doubles in specific circumstances under concurrent
workloads - not relevant to "Huge Data sets, simple queries", but possibly
helpful for certain kinds of OLTP applications.

Ah, but someday Pg will be able to concurrently read from two
datastreams to complete a single query.  And that day will be glorious
and fine, and you'll want as much disk concurrency as you can get your
hands on.

-jwb


---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

               http://archives.postgresql.org




[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux