Enhance pg_dump multi-threaded streaming (WAS: Re: filesystem full during vacuum - space recovery issues)

Thomas Simpson <ts@xxxxxxxxxxxxxx> · Thu, 18 Jul 2024 19:08:06 -0400



    [Added cross post to pgsql-hackers@xxxxxxxxxxxxxxxxxxxx -
      background is multi-TB database needs recovered via pgdumpall
      & reload, thoughts on ways to make pg_dump scale to
      multi-thread to expedite loading to a new cluster.  Straight dump
      to a file is impractical as the dump will be >200TB; hackers
      may be a better home for the discussion than current admin list]

    
    Hi Ron
    

    On 18-Jul-2024 18:41, Ron Johnson
      wrote:

    
        Multi-threaded writing to the same giant text
          file won't work too well, when all the data for one table
          needs to be together.
        

        Just temporarily add another disk for backups.
        

    For clarity, I'm not proposing multi threaded writing to one
      file; the proposal is a new special mode which specifically makes
      multiple output streams across *network sockets* to a listener
      which is listening on the other side.  The goal is avoiding any
      files at all and only using multiple network streams to gain
      multi-threaded processing with some co-ordination to keep things
      organized and consistent.
    This would really be specifically for the use-case of dump/reload
      upgrade or recreate rather than everyday use.  And particularly
      for very large databases.

    
    Looking at pg_dump.c it's doing the baseline organization but the
      extension would be adding the required coordination with the
      destination.  So, for a huge table (I have many) these would go in
      different streams but if there is a dependency (FK relations etc)
      the checkpoint needs to ensure those are met before proceeding. 
      Worst case scenario it would end up using only 1 thread but it
      would be very unusual to have a database where every table depends
      on another table all the way down.
    In theory at least, some gains should be achieved for typical
      databases where a degree of parallelism is possible.

    
    Thanks
    Tom

    
          On Thu, Jul 18, 2024 at
            4:55 PM Thomas Simpson <ts@xxxxxxxxxxxxxx>
            wrote:

          
              On 18-Jul-2024 16:32, Ron Johnson wrote:

              
                  On Thu, Jul 18, 2024 at 3:01 PM Thomas
                    Simpson <ts@xxxxxxxxxxxxxx>
                    wrote:
                  [snip]
                  
                    
                        [BTW, v9.6 which I know is old but this
                          server is stuck there]

                        
                    [snip] 
                    
                      
                        I know I'm stuck with the slow rebuild at
                          this point.  However, I doubt I am the only
                          person in the world that needs to dump and
                          reload a large database.  My thought is this
                          is a weak point for PostgreSQL so it makes
                          sense to consider ways to improve the dump
                          reload process, especially as it's the
                          last-resort upgrade path recommended in the
                          upgrade guide and the general fail-safe route
                          to get out of trouble.
                      
                    
                     No database does fast single-threaded backups.
                  
                
              Agreed.  My thought is that is should be possible for a
                'new dumpall' to be multi-threaded.
              Something like :
              * Set number of threads on 'source' (perhaps by
                querying a listening destination for how many threads it
                is prepared to accept via a control port)

              
              * Select each database in turn
              * Organize the tables which do not have references
                themselves
              * Send each table separately in each thread (or queue
                them until a thread is available)  ('Stage 1')

              
              * Rendezvous stage 1 completion (pause sending, wait
                until feedback from destination confirming all
                completed) so we have a known consistent state that is
                safe to proceed to subsequent tables

              
              * Work through tables that do refer to the previously
                sent in the same way (since the tables they reference
                exist and have their data) ('Stage 2')

              
              * Repeat progressively until all tables are done
                ('Stage 3', 4 etc. as necessary)

              
              The current dumpall is essentially doing this table
                organization currently [minus stage
                checkpoints/multi-thread] otherwise the dump/load would
                not work.  It may even be doing a lot of this for
                'directory' mode?  The change here is organizing n
                threads to process them concurrently where possible and
                coordinating the pipes so they only send data which can
                be accepted.
              The destination would need to have a multi-thread
                listen and co-ordinate with the sender on some control
                channel so feed back completion of each stage.
              Something like a destination host and control channel
                port to establish the pipes and create additional netcat
                pipes on incremental ports above the control port for
                each thread used.
              Dumpall seems like it could be a reasonable start point
                since it is already doing the complicated bits of
                serializing the dump data so it can be consistently
                loaded.

              
              Probably not really an admin question at this point,
                more a feature enhancement.
              Is there anything fundamentally wrong that someone with
                more intimate knowledge of dumpall could point out?
              Thanks
              Tom