Re: Benchmark Data requested --- pgloader CE design ideas

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 6 Feb 2008, Simon Riggs wrote:

For me, it would be good to see a --parallel=n parameter that would
allow pg_loader to distribute rows in "round-robin" manner to "n"
different concurrent COPY statements. i.e. a non-routing version.

Let me expand on this. In many of these giant COPY situations the bottleneck is plain old sequential I/O to a single process. You can almost predict how fast the rows will load using dd. Having a process that pulls rows in and distributes them round-robin is good, but it won't crack that bottleneck. The useful approaches I've seen for other databases all presume that the data files involved are large enough that on big hardware, you can start multiple processes running at different points in the file and beat anything possible with a single reader.

If I'm loading a TB file, odds are good I can split that into 4 or more vertical pieces (say rows 1-25%, 25-50%, 50-75%, 75-100%), start 4 loaders at once, and get way more than 1 disk worth of throughput reading. You have to play with the exact number because if you push the split too far you introduce seek slowdown instead of improvements, but that's the basic design I'd like to see one day. It's not parallel loading that's useful for the cases I'm thinking about until something like this comes around.

--
* Greg Smith gsmith@xxxxxxxxxxxxx http://www.gregsmith.com Baltimore, MD

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

              http://www.postgresql.org/docs/faq

[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux