Re: pg_dump/pg_restore --jobs practical limit?

Tomek <tomekphotos@xxxxxxxxx> · Thu, 2 Nov 2023 10:07:50 +0100

Hi!
When I really want to use all the resources - I set the number of jobs to a value equal to the number of CPU plus 1. Probably there is no reason to make run more jobs than number of CPU :-) .
Unfortunately, pg_dump will not allocate more than one thread to a table, even a huge one (unless it is partitioned) - so, sometimes it is no sense to define many jobs when you have one or two big tables and the rest of them are relatively small - in such situation there will be no difference if you define 4 or 10 jobs.
But, yes testing is the best way to get known :-).

   Regards Tomek

czw., 2 lis 2023 o 02:20 Ron <ronljohnsonjr@xxxxxxxxx> napisał(a):

    On 11/1/23 20:05, Brad White wrote:

          From:
            Ron <ronljohnsonjr@xxxxxxxxx>

            Sent: Thursday, November 2, 2023 3:01:47 AM

            To: pgsql-general@xxxxxxxxxxxxxxxxxxxx
            <pgsql-general@xxxxxxxxxxxxxxxxxxxx>

            Subject: Re: pg_dump/pg_restore --jobs
            practical limit?

          On 11/1/23 15:42, Laurenz Albe wrote:

            On Wed, 2023-11-01 at 13:09 -0500, Ron wrote:

              I will "soon" migrate some 1+ TB database from Pg 9.6.24 on RHEL 6 VMs to Pg 
14.latest on RHEL 8 VMs.  The VMs have 10Gbps "NICs", SAN-based LUNs managed 
by LVM, and are all on ESX blades.  nproc count on some is 16 and on others 
is 32.

Does anyone have experience as to the point of diminishing returns?

IOW, can I crank them processes up to --jobs=30, will I see no gain -- or 
even degradation -- after, for example, --jobs=24?

This would be for both pg_dump and pg_restore (which would be run on the 
RHEL 8 VM).

            Test, test, test.  Theoretical considerations are pretty worthless,

          Which is why I asked if anyone has experience.

             and it is easy to measure that.

          Not necessarily. Our test systems are way too small (only good enough to validate that the script works correctly), and there's always something (sometimes a lot, sometime just "some") going on in production, whether it's
 my customer's work, or the SAN (like snapshotting every VM and then copying the snapshots to the virtual tape device) or something else.

          Sure,  but are the new
            systems busy already?
          Ideally you could run
            tests on them before they are put into production. 

    Testing pg_restore with different --jobs= values will be easier.  
    pg_dump is what's going to be reading from a constantly varying
    system.

    -- 

      Born in Arizona, moved to Babylonia.