Re: Pg 16: will pg_dump & pg_restore be faster?

Andres Freund <andres@xxxxxxxxxxx> · Wed, 31 May 2023 06:45:52 -0700

Hi,

On 2023-05-30 21:13:08 -0400, Bruce Momjian wrote:
> On Wed, May 31, 2023 at 09:14:20AM +1200, David Rowley wrote:
> > On Wed, 31 May 2023 at 08:54, Ron <ronljohnsonjr@xxxxxxxxx> wrote:
> > > https://www.postgresql.org/about/news/postgresql-16-beta-1-released-2643/
> > > says "PostgreSQL 16 can also improve the performance of concurrent bulk
> > > loading of data using COPY up to 300%."
> > >
> > > Since pg_dump & pg_restore use COPY (or something very similar), will the
> > > speed increase translate to higher speeds for those utilities?
> > 
> > I think the improvements to relation extension only help when multiple
> > backends need to extend the relation at the same time.  pg_restore can
> > have multiple workers, but the tasks that each worker performs are
> > only divided as far as an entire table, i.e. 2 workers will never be
> > working on the same table at the same time. So there is no concurrency
> > in terms of 2 or more workers working on loading data into the same
> > table at the same time.
> > 
> > It might be an interesting project now that we have TidRange scans, to
> > have pg_dump split larger tables into chunks so that they can be
> > restored in parallel.
> 
> Uh, the release notes say:
> 
> 	<!--
> 	Author: Andres Freund <andres@xxxxxxxxxxx>
> 	2023-04-06 [00d1e02be] hio: Use ExtendBufferedRelBy() to extend tables more eff
> 	Author: Andres Freund <andres@xxxxxxxxxxx>
> 	2023-04-06 [26158b852] Use ExtendBufferedRelTo() in XLogReadBufferExtended()
> 	-->
> 	
> 	<listitem>
> 	<para>
> 	Allow more efficient addition of heap and index pages (Andres Freund)
> 	</para>
> 	</listitem>
> 
> There is no mention of concurrency being a requirement.  Is it wrong?  I
> think there was a question of whether you had to add _multiple_ blocks
> ot get a benefit, not if concurrency was needed.  This email about the
> release notes didn't mention the concurrent requirement:

> 	https://www.postgresql.org/message-id/20230521171341.jjxykfsefsek4kzj%40awork3.anarazel.de

There's multiple improvements that work together to get the overall
improvement. One part of that is filesystem interactions, another is holding
the relation extension lock for a *much* shorter time. The former helps
regardless of concurrency, the latter only with concurrency.

Regards,

Andres